Professor, SAHIST, Sungkyunkwan University
Director, Digital Healthcare Institute
Yoon Sup Choi, Ph.D.
디지털 의료가 ‘의료’가 될 때

When Digital Medicine Becomes the Medicine
Disclaimer
저는 위의 회사들과 지분 관계, 자문 등으로

이해 관계가 있음을 밝힙니다.
스타트업
벤처캐피털
“It's in Apple's DNA that technology alone is not enough. 

It's technology married with liberal arts.”
The Convergence of IT, BT and Medicine
최윤섭 지음
의료인공지능
표지디자인•최승협
컴퓨터
털 헬
치를 만드는 것을 화두로
기업가, 엔젤투자가, 에반
의 대표적인 전문가로, 활
이 분야를 처음 소개한 장
포항공과대학교에서 컴
동 대학원 시스템생명공
취득하였다. 스탠퍼드대
조교수, KT 종합기술원 컨
구원 연구조교수 등을 거
저널에 10여 편의 논문을
국내 최초로 디지털 헬스
윤섭 디지털 헬스케어 연
국내 유일의 헬스케어 스
어 파트너스’의 공동 창업
스타트업을 의료 전문가
관대학교 디지털헬스학과
뷰노, 직토, 3billion, 서지
소울링, 메디히어, 모바일
자문을 맡아 한국에서도
고 있다. 국내 최초의 디
케어 이노베이션』에 활발
을 연재하고 있다. 저서로
와 『그렇게 나는 스스로
•블로그_ http://www
•페이스북_ https://w
•이메일_ yoonsup.c
최윤섭
의료 인공지능은 보수적인 의료 시스템을 재편할 혁신을 일으키고 있다. 의료 인공지능의 빠른 발전과
광범위한 영향은 전문화, 세분화되며 발전해 온 현대 의료 전문가들이 이해하기가 어려우며, 어디서부
터 공부해야 할지도 막연하다. 이런 상황에서 의료 인공지능의 개념과 적용, 그리고 의사와의 관계를 쉽
게 풀어내는 이 책은 좋은 길라잡이가 될 것이다. 특히 미래의 주역이 될 의학도와 젊은 의료인에게 유용
한 소개서이다.
━ 서준범, 서울아산병원 영상의학과 교수, 의료영상인공지능사업단장
인공지능이 의료의 패러다임을 크게 바꿀 것이라는 것에 동의하지 않는 사람은 거의 없다. 하지만 인공
지능이 처리해야 할 의료의 난제는 많으며 그 해결 방안도 천차만별이다. 흔히 생각하는 만병통치약 같
은 의료 인공지능은 존재하지 않는다. 이 책은 다양한 의료 인공지능의 개발, 활용 및 가능성을 균형 있
게 분석하고 있다. 인공지능을 도입하려는 의료인, 생소한 의료 영역에 도전할 인공지능 연구자 모두에
게 일독을 권한다.
━ 정지훈, 경희사이버대 미디어커뮤니케이션학과 선임강의교수, 의사
서울의대 기초의학교육을 책임지고 있는 교수의 입장에서, 산업화 이후 변하지 않은 현재의 의학 교육
으로는 격변하는 인공지능 시대에 의대생을 대비시키지 못한다는 한계를 절실히 느낀다. 저와 함께 의
대 인공지능 교육을 개척하고 있는 최윤섭 소장의 전문적 분석과 미래 지향적 안목이 담긴 책이다. 인공
지능이라는 미래를 대비할 의대생과 교수, 그리고 의대 진학을 고민하는 학생과 학부모에게 추천한다.
━ 최형진, 서울대학교 의과대학 해부학교실 교수, 내과 전문의
최근 의료 인공지능의 도입에 대해서 극단적인 시각과 태도가 공존하고 있다. 이 책은 다양한 사례와 깊
은 통찰을 통해 의료 인공지능의 현황과 미래에 대해 균형적인 시각을 제공하여, 인공지능이 의료에 본
격적으로 도입되기 위한 토론의 장을 마련한다. 의료 인공지능이 일상화된 10년 후 돌아보았을 때, 이 책
이 그런 시대를 이끄는 길라잡이 역할을 하였음을 확인할 수 있기를 기대한다.
━ 정규환, 뷰노 CTO
의료 인공지능은 다른 분야 인공지능보다 더 본질적인 이해가 필요하다. 단순히 인간의 일을 대신하는
수준을 넘어 의학의 패러다임을 데이터 기반으로 변화시키기 때문이다. 따라서 인공지능을 균형있게 이
해하고, 어떻게 의사와 환자에게 도움을 줄 수 있을지 깊은 고민이 필요하다. 세계적으로 일어나고 있는
이러한 노력의 결과물을 집대성한 이 책이 반가운 이유다.
━ 백승욱, 루닛 대표
의료 인공지능의 최신 동향뿐만 아니라, 의의와 한계, 전망, 그리고 다양한 생각거리까지 주는 책이다.
논쟁이 되는 여러 이슈에 대해서도 저자는 자신의 시각을 명확한 근거에 기반하여 설득력 있게 제시하
고 있다. 개인적으로는 이 책을 대학원 수업 교재로 활용하려 한다.
━ 신수용, 성균관대학교 디지털헬스학과 교수
최윤섭지음
의료인공지능
값 20,000원
ISBN 979-11-86269-99-2
최초의 책!
계 안팎에서 제기
고 있다. 현재 의
분 커버했다고 자
것인가, 어느 진료
제하고 효용과 안
누가 지는가, 의학
쉬운 언어로 깊이
들이 의료 인공지
적인 용어를 최대
서 다른 곳에서 접
를 접하게 될 것
너무나 빨리 발전
책에서 제시하는
술을 공부하며, 앞
란다.
의사 면허를 취득
저가 도움되면 좋
를 불러일으킬 것
화를 일으킬 수도
슈에 제대로 대응
분은 의학 교육의
예비 의사들은 샌
지능과 함께하는
레이닝 방식도 이
전에 진료실과 수
겠지만, 여러분들
도생하는 수밖에
미래의료학자 최윤섭 박사가 제시하는
의료 인공지능의 현재와 미래
의료 딥러닝과 IBM 왓슨의 현주소
인공지능은 의사를 대체하는가
값 20,000원
ISBN 979-11-86269-99-2
레이닝 방식도 이
전에 진료실과 수
겠지만, 여러분들
도생하는 수밖에
소울링, 메디히어, 모바일
자문을 맡아 한국에서도
고 있다. 국내 최초의 디
케어 이노베이션』에 활발
을 연재하고 있다. 저서로
와 『그렇게 나는 스스로
•블로그_ http://www
•페이스북_ https://w
•이메일_ yoonsup.c
Inevitable Tsunami of Change
대한영상의학회 춘계학술대회 2017.6
Vinod Khosla
Founder, 1st CEO of Sun Microsystems
Partner of KPCB, CEO of KhoslaVentures
LegendaryVenture Capitalist in SiliconValley
“Technology will replace 80% of doctors”
https://www.youtube.com/watch?time_continue=70&v=2HMPRXstSvQ
“영상의학과 전문의를 양성하는 것을 당장 그만둬야 한다.
5년 안에 딥러닝이 영상의학과 전문의를 능가할 것은 자명하다.”
Hinton on Radiology
http://rockhealth.com/2015/01/digital-health-funding-tops-4-1b-2014-year-review/
• "2018년 3Q는 역대 최고로 투자 받기 좋은 시기였다”
• 2018년 3Q에서 이미 2017년 투자 규모를 능가
• 모든 라운드에서 더 높은 빈도로, 더 큰 금액이 투자되는 entrepreneurs’ market
헬스케어넓은 의미의 건강 관리에는 해당되지만,
디지털 기술이 적용되지 않고, 전문 의료 영역도 아닌 것
예) 운동, 영양, 수면
디지털 헬스케어
건강 관리 중에 디지털 기술이 사용되는 것
예) 사물인터넷, 인공지능, 3D 프린터, VR/AR
모바일 헬스케어
디지털 헬스케어 중
모바일 기술이 사용되는 것
예) 스마트폰, 사물인터넷, SNS
개인 유전정보분석
예) 암유전체, 질병위험도,
보인자, 약물 민감도
예) 웰니스, 조상 분석
헬스케어 관련 분야 구성도 (ver 0.3)
의료
질병 예방, 치료, 처방, 관리
등 전문 의료 영역
원격의료
원격진료
EDITORIAL OPEN
Digital medicine, on its way to being just plain medicine
npj Digital Medicine (2018)1:20175 ; doi:10.1038/
s41746-017-0005-1
There are already nearly 30,000 peer-reviewed English-language
scientific journals, producing an estimated 2.5 million articles a year.1
So why another, and why one focused specifically on digital
medicine?
To answer that question, we need to begin by defining what
“digital medicine” means: using digital tools to upgrade the
practice of medicine to one that is high-definition and far more
individualized. It encompasses our ability to digitize human beings
using biosensors that track our complex physiologic systems, but
also the means to process the vast data generated via algorithms,
cloud computing, and artificial intelligence. It has the potential to
democratize medicine, with smartphones as the hub, enabling
each individual to generate their own real world data and being
far more engaged with their health. Add to this new imaging
tools, mobile device laboratory capabilities, end-to-end digital
clinical trials, telemedicine, and one can see there is a remarkable
array of transformative technology which lays the groundwork for
a new form of healthcare.
As is obvious by its definition, the far-reaching scope of digital
medicine straddles many and widely varied expertise. Computer
scientists, healthcare providers, engineers, behavioral scientists,
ethicists, clinical researchers, and epidemiologists are just some of
the backgrounds necessary to move the field forward. But to truly
accelerate the development of digital medicine solutions in health
requires the collaborative and thoughtful interaction between
individuals from several, if not most of these specialties. That is the
primary goal of npj Digital Medicine: to serve as a cross-cutting
resource for everyone interested in this area, fostering collabora-
tions and accelerating its advancement.
Current systems of healthcare face multiple insurmountable
challenges. Patients are not receiving the kind of care they want
and need, caregivers are dissatisfied with their role, and in most
countries, especially the United States, the cost of care is
unsustainable. We are confident that the development of new
systems of care that take full advantage of the many capabilities
that digital innovations bring can address all of these major issues.
Researchers too, can take advantage of these leading-edge
technologies as they enable clinical research to break free of the
confines of the academic medical center and be brought into the
real world of participants’ lives. The continuous capture of multiple
interconnected streams of data will allow for a much deeper
refinement of our understanding and definition of most pheno-
types, with the discovery of novel signals in these enormous data
sets made possible only through the use of machine learning.
Our enthusiasm for the future of digital medicine is tempered by
the recognition that presently too much of the publicized work in
this field is characterized by irrational exuberance and excessive
hype. Many technologies have yet to be formally studied in a
clinical setting, and for those that have, too many began and
ended with an under-powered pilot program. In addition, there are
more than a few examples of digital “snake oil” with substantial
uptake prior to their eventual discrediting.2
Both of these practices
are barriers to advancing the field of digital medicine.
Our vision for npj Digital Medicine is to provide a reliable,
evidence-based forum for all clinicians, researchers, and even
patients, curious about how digital technologies can transform
every aspect of health management and care. Being open source,
as all medical research should be, allows for the broadest possible
dissemination, which we will strongly encourage, including
through advocating for the publication of preprints
And finally, quite paradoxically, we hope that npj Digital
Medicine is so successful that in the coming years there will no
longer be a need for this journal, or any journal specifically
focused on digital medicine. Because if we are able to meet our
primary goal of accelerating the advancement of digital medicine,
then soon, we will just be calling it medicine. And there are
already several excellent journals for that.
ACKNOWLEDGEMENTS
Supported by the National Institutes of Health (NIH)/National Center for Advancing
Translational Sciences grant UL1TR001114 and a grant from the Qualcomm Foundation.
ADDITIONAL INFORMATION
Competing interests:The authors declare no competing financial interests.
Publisher's note:Springer Nature remains neutral with regard to jurisdictional claims
in published maps and institutional affiliations.
Change history:The original version of this Article had an incorrect Article number
of 5 and an incorrect Publication year of 2017. These errors have now been corrected
in the PDF and HTML versions of the Article.
Steven R. Steinhubl1
and Eric J. Topol1
1
Scripps Translational Science Institute, 3344 North Torrey Pines
Court, Suite 300, La Jolla, CA 92037, USA
Correspondence: Steven R. Steinhubl (steinhub@scripps.edu) or
Eric J. Topol (etopol@scripps.edu)
REFERENCES
1. Ware, M. & Mabe, M. The STM report: an overview of scientific and scholarly journal
publishing 2015 [updated March]. http://digitalcommons.unl.edu/scholcom/92017
(2015).
2. Plante, T. B., Urrea, B. & MacFarlane, Z. T. et al. Validation of the instant blood
pressure smartphone App. JAMA Intern. Med. 176, 700–702 (2016).
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative
Commons license, and indicate if changes were made. The images or other third party
material in this article are included in the article’s Creative Commons license, unless
indicated otherwise in a credit line to the material. If material is not included in the
article’s Creative Commons license and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder. To view a copy of this license, visit http://creativecommons.
org/licenses/by/4.0/.
© The Author(s) 2018
Received: 19 October 2017 Accepted: 25 October 2017
www.nature.com/npjdigitalmed
Published in partnership with the Scripps Translational Science Institute
디지털 의료의 미래는?
일상적인 의료가 되는 것
디지털 의료가 ‘의료’가 될 때
•데이터, 데이터, 데이터
•의료 인공지능
•원격 의료
•VR/AR 기반 수련/수술
•디지털 신약
•환자 주도의 의료
What is most important factor in digital medicine?
“Data! Data! Data!” he cried.“I can’t
make bricks without clay!”
- Sherlock Holmes,“The Adventure of the Copper Beeches”
데이터, 데이터, 데이터
미래 의료의 근간
새로운 데이터가
새로운 방식으로
새로운 주체에 의해
측정, 저장, 통합, 분석된다.
데이터의 종류
데이터의 질
데이터의 양
웨어러블 기기
스마트폰
유전 정보 분석
인공지능
SNS
사용자/환자
대중
디지털 헬스케어의 3단계
•Step 1. 데이터의 측정

•Step 2. 데이터의 통합

•Step 3. 데이터의 분석
Sci Transl Med 2015
데이터 소스 (1) 스마트폰
검이경 더마토스코프 안과질환 피부암
기생충 호흡기 심전도 수면
식단 활동량 발열 생리/임신
데이터 소스 (2) 웨어러블
n
n-
ng
n
es
h-
n
ne
ne
ct
d
n-
at
s-
or
e,
ts
n
a-
gs
d
ch
Nat Biotech 2015
http://www.rolls-royce.com/about/our-technology/enabling-technologies/engine-health-management.aspx#sense
250 sensors to monitor the “health” of the GE turbines
Fig 1. What can consumer wearables do? Heart rate can be measured with an oximeter built into a ring [3], muscle activity with an electromyographi
sensor embedded into clothing [4], stress with an electodermal sensor incorporated into a wristband [5], and physical activity or sleep patterns via an
accelerometer in a watch [6,7]. In addition, a female’s most fertile period can be identified with detailed body temperature tracking [8], while levels of me
attention can be monitored with a small number of non-gelled electroencephalogram (EEG) electrodes [9]. Levels of social interaction (also known to a
PLOS Medicine 2016
Hype or Hope?
Source: Gartner
데이터 소스 (3) 유전정보
가타카 (1997)
가타카 (1997)
2003 Human Genome Project 13 years (676 weeks) $2,700,000,000
2007 Dr. CraigVenter’s genome 4 years (208 weeks) $100,000,000
2008 Dr. James Watson’s genome 4 months (16 weeks) $1,000,000
2009 (Nature Biotechnology) 4 weeks $48,000
2013 1-2 weeks ~$5,000
The $1000 Genome is Already Here!
•2017년 1월 NovaSeq 5000, 6000 발표
•몇년 내로 $100로 WES 를 실현하겠다고 공언
•2일에 60명의 WES 가능 (한 명당 한 시간 이하)
2007-11
2011-06
2011-10
2012-04
2012-10
2013-04
2013-06
2013-09
2013-12
2014-10
2015-02
2015-06
2016-02
2017-04
2017-11
2018-04
1,200,000
1,000,000
900,000
650,000
500,000
400,000
300,000
2,000,000
0
3,000,000
5,000,000
100,000
Customer growth of 23andMe
Human genomes are being sequenced at an ever-increasing rate. The 1000 Genomes Project has
aggregated hundreds of genomes; The Cancer Genome Atlas (TGCA) has gathered several thousand; and
the Exome Aggregation Consortium (ExAC) has sequenced more than 60,000 exomes. Dotted lines show
three possible future growth curves.
DNA SEQUENCING SOARS
2001 2005 2010 2015 2020 2025
100
103
106
109
Human Genome Project
Cumulativenumberofhumangenomes
1000 Genomes
TCGA
ExAC
Current amount
1st personal genome
Recorded growth
Projection
Double every 7 months (historical growth rate)
Double every 12 months (Illumina estimate)
Double every 18 months (Moore's law)
Michael Einsetein, Nature, 2015
more rapid and accurate approaches to infectious diseases. The driver mutations and key biologic unde
Sequencing Applications in Medicine
from Prewomb to Tomb
Cell. 2014 Mar 27; 157(1): 241–253.
데이터 소스 (4) 디지털 표현형
Digital Phenotype:
Your smartphone knows if you are depressed
Ginger.io
Digital Phenotype:
Your smartphone knows if you are depressed
J Med Internet Res. 2015 Jul 15;17(7):e175.
The correlation analysis between the features and the PHQ-9 scores revealed that 6 of the 10
features were significantly correlated to the scores:
• strong correlation: circadian movement, normalized entropy, location variance
• correlation: phone usage features, usage duration and usage frequency
the manifestations of disease by providing a
more comprehensive and nuanced view of the
experience of illness. Through the lens of the
digital phenotype, an individual’s interaction
The digital phenotype
Sachin H Jain, Brian W Powers, Jared B Hawkins & John S Brownstein
In the coming years, patient phenotypes captured to enhance health and wellness will extend to human interactions with
digital technology.
In 1982, the evolutionary biologist Richard
Dawkins introduced the concept of the
“extended phenotype”1, the idea that pheno-
types should not be limited just to biological
processes, such as protein biosynthesis or tissue
growth, but extended to include all effects that
a gene has on its environment inside or outside
ofthebodyoftheindividualorganism.Dawkins
stressed that many delineations of phenotypes
are arbitrary. Animals and humans can modify
their environments, and these modifications
andassociatedbehaviorsareexpressionsofone’s
genome and, thus, part of their extended phe-
notype. In the animal kingdom, he cites damn
buildingbybeaversasanexampleofthebeaver’s
extended phenotype1.
Aspersonaltechnologybecomesincreasingly
embedded in human lives, we think there is an
important extension of Dawkins’s theory—the
notion of a ‘digital phenotype’. Can aspects of
ourinterfacewithtechnologybesomehowdiag-
nosticand/orprognosticforcertainconditions?
Can one’s clinical data be linked and analyzed
together with online activity and behavior data
to create a unified, nuanced view of human dis-
ease?Here,wedescribetheconceptofthedigital
phenotype. Although several disparate studies
have touched on this notion, the framework for
medicine has yet to be described. We attempt to
define digital phenotype and further describe
the opportunities and challenges in incorporat-
ing these data into healthcare.
Jan. 2013
0.000
0.002
0.004
Density
0.006
July 2013 Jan. 2014 July 2014
User 1
User 2
User 3
User 4
User 5
User 6
User 7
Date
Figure 1 Timeline of insomnia-related tweets from representative individuals. Density distributions
(probability density functions) are shown for seven individual users over a two-year period. Density on
the y axis highlights periods of relative activity for each user. A representative tweet from each user is
shown as an example.
npg©2015NatureAmerica,Inc.Allrightsreserved.
http://www.nature.com/nbt/journal/v33/n5/full/nbt.3223.html
ers, Jared B Hawkins & John S Brownstein
phenotypes captured to enhance health and wellness will extend to human interactions with
st Richard
pt of the
hat pheno-
biological
sis or tissue
effects that
or outside
m.Dawkins
phenotypes
can modify
difications
onsofone’s
ended phe-
cites damn
hebeaver’s
ncreasingly
there is an
heory—the
aspects of
ehowdiag-
Jan. 2013
0.000
0.002
0.004
Density
0.006
July 2013 Jan. 2014 July 2014
User 1
User 2
User 3
User 4
User 5
User 6
User 7
Date
Figure 1 Timeline of insomnia-related tweets from representative individuals. Density distributions
(probability density functions) are shown for seven individual users over a two-year period. Density on
the y axis highlights periods of relative activity for each user. A representative tweet from each user is
Your twitter knows if you cannot sleep
Timeline of insomnia-related tweets from representative individuals.
Nat. Biotech. 2015
Reece & Danforth, “Instagram photos reveal predictive markers of depression” (2016)
higher Hue (bluer)
lower Saturation (grayer)
lower Brightness (darker)
Digital Phenotype:
Your Instagram knows if you are depressed
Rao (MVR) (24) .  
 
Results 
Both All­data and Pre­diagnosis models were decisively superior to a null model
. All­data predictors were significant with 99% probability.57.5;(KAll  = 1 K 49.8)  Pre = 1  7
Pre­diagnosis and All­data confidence levels were largely identical, with two exceptions: 
Pre­diagnosis Brightness decreased to 90% confidence, and Pre­diagnosis posting frequency 
dropped to 30% confidence, suggesting a null predictive value in the latter case.  
Increased hue, along with decreased brightness and saturation, predicted depression. This 
means that photos posted by depressed individuals tended to be bluer, darker, and grayer (see 
Fig. 2). The more comments Instagram posts received, the more likely they were posted by 
depressed participants, but the opposite was true for likes received. In the All­data model, higher 
posting frequency was also associated with depression. Depressed participants were more likely 
to post photos with faces, but had a lower average face count per photograph than healthy 
participants. Finally, depressed participants were less likely to apply Instagram filters to their 
posted photos.  
 
Fig. 2. Magnitude and direction of regression coefficients in All­data (N=24,713) and Pre­diagnosis (N=18,513) 
models. X­axis values represent the adjustment in odds of an observation belonging to depressed individuals, per 
Reece & Danforth, “Instagram photos reveal predictive markers of depression” (2016)
 
 
Fig. 1. Comparison of HSV values. Right photograph has higher Hue (bluer), lower Saturation (grayer), and lower 
Brightness (darker) than left photograph. Instagram photos posted by depressed individuals had HSV values 
shifted towards those in the right photograph, compared with photos posted by healthy individuals. 
 
Units of observation 
In determining the best time span for this analysis, we encountered a difficult question: 
When and for how long does depression occur? A diagnosis of depression does not indicate the 
persistence of a depressive state for every moment of every day, and to conduct analysis using an 
individual’s entire posting history as a single unit of observation is therefore rather specious. At 
the other extreme, to take each individual photograph as units of observation runs the risk of 
being too granular. DeChoudhury et al. (5) looked at all of a given user’s posts in a single day, 
and aggregated those data into per­person, per­day units of observation. We adopted this 
precedent of “user­days” as a unit of analysis .  5
 
Statistical framework 
We used Bayesian logistic regression with uninformative priors to determine the strength 
of individual predictors. Two separate models were trained. The All­data model used all 
collected data to address Hypothesis 1. The Pre­diagnosis model used all data collected from 
higher Hue (bluer)
lower Saturation (grayer)
lower Brightness (darker)
Digital Phenotype:
Your Instagram knows if you are depressed
Reece & Danforth, “Instagram photos reveal predictive markers of depression” (2016)
. In particular, depressedχ2 07.84, p .17e 64;( All  = 9   = 9 − 1 13.80, p .87e 44)χ2Pre  = 8   = 2 − 1  
participants were less likely than healthy participants to use any filters at all. When depressed 
participants did employ filters, they most disproportionately favored the “Inkwell” filter, which 
converts color photographs to black­and­white images. Conversely, healthy participants most 
disproportionately favored the Valencia filter, which lightens the tint of photos. Examples of 
filtered photographs are provided in SI Appendix VIII.  
 
Fig. 3. Instagram filter usage among depressed and healthy participants. Bars indicate difference between observed 
and expected usage frequencies, based on a Chi­squared analysis of independence. Blue bars indicate 
disproportionate use of a filter by depressed compared to healthy participants, orange bars indicate the reverse. 
Digital Phenotype:
Your Instagram knows if you are depressed
Reece & Danforth, “Instagram photos reveal predictive markers of depression” (2016)
 
VIII. Instagram filter examples 
 
Fig. S8. Examples of Inkwell and Valencia Instagram filters.  Inkwell converts 
color photos to black­and­white, Valencia lightens tint.  Depressed participants 
most favored Inkwell compared to healthy participants, Healthy participants 
데이터 소스 (5) 마이크로바이옴
Leading Edge
Review
Individualized Medicine
from Prewomb to Tomb
Eric J. Topol1 ,*
1The Scripps Translational Science Institute, The Scripps Research Institute and Scripps Health, La Jolla, CA 92037, USA
*Correspondence: etopol@scripps.edu
http://dx.doi.org/10.1016/j.cell.2014.02.012
That each of us is truly biologically unique, extending to even monozygotic, ‘‘identical’’ twins, is not
fully appreciated. Now that it is possible to perform a comprehensive ‘‘omic’’ assessment of an
individual, including one’s DNA and RNA sequence and at least some characterization of one’s
proteome, metabolome, microbiome, autoantibodies, and epigenome, it has become abundantly
clear that each of us has truly one-of-a-kind biological content. Well beyond the allure of the match-
less fingerprint or snowflake concept, these singular, individual data and information set up a
remarkable and unprecedented opportunity to improve medical treatment and develop preventive
strategies to preserve health.
From Digital to Biological to Individualized Medicine
In 2010, Eric Schmidt of Google said ‘‘The power of individual
targeting—the technology will be so good it will be very hard
for people to watch or consume something that has not in
some sense been tailored for them’’ (Jenkins, 2010). Although
referring to the capability of digital technology, we have now
reached a time of convergence of the digital and biologic do-
mains. It has been well established that 0 and 1 are interchange-
able with A, C, T, and G in books and Shakespeare sonnets and
that DNA may represent the ultimate data storage system
(Church et al., 2012; Goldman et al., 2013b). Biological transis-
tors, also known as genetic logic gates, have now been devel-
oped that make a computer from a living cell (Bonnet et al.,
2013). The convergence of biology and technology was further
captured by one of the protagonists of the digital era, Steve
Jobs, who said ‘‘I think the biggest innovations of the 21st
cen-
tury will be at the intersection of biology and technology. A
new era is beginning’’ (Issacson, 2011).
With whole-genome DNA sequencing and a variety of omic
technologies to define aspects of each individual’s biology at
many different levels, we have indeed embarked on a new era
of medicine. The term ‘‘personalized medicine’’ has been used
for many years but has engendered considerable confusion. A
recent survey indicated that only 4% of the public understand
what the term is intended to mean (Stanton, 2013), and the hack-
neyed, commercial use of ‘‘personalized’’ makes many people
think that this refers to a concierge service of medical care.
Whereas ‘‘person’’ refers to a human being, ‘‘personalized’’
can mean anything from having monogrammed stationary or
luggage to ascribing personal qualities. Therefore, it was not
surprising that a committee representing the National Academy
of Sciences proposed using the term ‘‘precision medicine’’ as
defined by ‘‘tailoring of medical treatment to the individual char-
acteristics of each patient’’ (National Research Council, 2011).
Although the term ‘‘precision’’ denotes the objective of exact-
ness, ironically, it too can be viewed as ambiguous in this context
because it does not capture the sense that the information is
derived from the individual. For example, many laboratory tests
could be made more precise by assay methodology, and treat-
ments could be made more precise by avoiding side effects—
without having anything to do with a specific individual. Other
terms that have been suggested include genomic, digital, and
stratified medicine, but all of these have a similar problem or
appear to be too narrowly focused.
The definition of individual is a single human being, derived
from the Latin word individu, or indivisible. I propose individual-
ized medicine as the preferred term because it has a useful
double entendre. It relates not only to medicine that is particular-
ized to a human being but also the future impact of digital
technology on individuals driving their health care. There will
increasingly be the flow of one’s biologic data and relevant
medical information directly to the individual. Be it a genome
sequence on a tablet or the results of a biosensor for blood pres-
sure or another physiologic metric displayed on a smartphone,
the digital convergence with biology will definitively anchor the
individual as a source of salient data, the conduit of information
flow, and a—if not the—principal driver of medicine in the future.
The Human GIS
Perhaps the most commonly used geographic information
systems (GIS) are Google maps, which provide a layered
approach to data visualization, such as viewing a location via
satellite overlaid with street names, landmarks, and real-time
traffic data. This GIS exemplifies the concept of gathering and
transforming large bodies of data to provide exquisite temporal
and location information. With the multiple virtual views, it gives
one the sense of physically being on site. Although Google has
digitized and thus created a GIS for the Earth, it is now possible
to digitize a human being. As shown in Figure 1, there are multi-
ple layers of data that can now be obtained for any individual.
This includes data from biosensors, scanners, electronic medi-
cal records, social media, and the various omics that include
Cell 157, March 27, 2014 ª2014 Elsevier Inc. 241
Leading Edge
Review
Individualized Medicine
from Prewomb to Tomb
Eric J. Topol1 ,*
1The Scripps Translational Science Institute, The Scripps Research Institute and Scripps Health, La Jolla, CA 92037, USA
*Correspondence: etopol@scripps.edu
http://dx.doi.org/10.1016/j.cell.2014.02.012
That each of us is truly biologically unique, extending to even monozygotic, ‘‘identical’’ twins, is not
fully appreciated. Now that it is possible to perform a comprehensive ‘‘omic’’ assessment of an
individual, including one’s DNA and RNA sequence and at least some characterization of one’s
proteome, metabolome, microbiome, autoantibodies, and epigenome, it has become abundantly
clear that each of us has truly one-of-a-kind biological content. Well beyond the allure of the match-
less fingerprint or snowflake concept, these singular, individual data and information set up a
remarkable and unprecedented opportunity to improve medical treatment and develop preventive
strategies to preserve health.
From Digital to Biological to Individualized Medicine
In 2010, Eric Schmidt of Google said ‘‘The power of individual
targeting—the technology will be so good it will be very hard
for people to watch or consume something that has not in
some sense been tailored for them’’ (Jenkins, 2010). Although
referring to the capability of digital technology, we have now
reached a time of convergence of the digital and biologic do-
mains. It has been well established that 0 and 1 are interchange-
able with A, C, T, and G in books and Shakespeare sonnets and
that DNA may represent the ultimate data storage system
(Church et al., 2012; Goldman et al., 2013b). Biological transis-
tors, also known as genetic logic gates, have now been devel-
oped that make a computer from a living cell (Bonnet et al.,
2013). The convergence of biology and technology was further
captured by one of the protagonists of the digital era, Steve
Jobs, who said ‘‘I think the biggest innovations of the 21st
cen-
tury will be at the intersection of biology and technology. A
new era is beginning’’ (Issacson, 2011).
With whole-genome DNA sequencing and a variety of omic
technologies to define aspects of each individual’s biology at
many different levels, we have indeed embarked on a new era
of medicine. The term ‘‘personalized medicine’’ has been used
for many years but has engendered considerable confusion. A
recent survey indicated that only 4% of the public understand
what the term is intended to mean (Stanton, 2013), and the hack-
neyed, commercial use of ‘‘personalized’’ makes many people
think that this refers to a concierge service of medical care.
Whereas ‘‘person’’ refers to a human being, ‘‘personalized’’
can mean anything from having monogrammed stationary or
luggage to ascribing personal qualities. Therefore, it was not
surprising that a committee representing the National Academy
of Sciences proposed using the term ‘‘precision medicine’’ as
defined by ‘‘tailoring of medical treatment to the individual char-
acteristics of each patient’’ (National Research Council, 2011).
Although the term ‘‘precision’’ denotes the objective of exact-
ness, ironically, it too can be viewed as ambiguous in this context
because it does not capture the sense that the information is
derived from the individual. For example, many laboratory tests
could be made more precise by assay methodology, and treat-
ments could be made more precise by avoiding side effects—
without having anything to do with a specific individual. Other
terms that have been suggested include genomic, digital, and
stratified medicine, but all of these have a similar problem or
appear to be too narrowly focused.
The definition of individual is a single human being, derived
from the Latin word individu, or indivisible. I propose individual-
ized medicine as the preferred term because it has a useful
double entendre. It relates not only to medicine that is particular-
ized to a human being but also the future impact of digital
technology on individuals driving their health care. There will
increasingly be the flow of one’s biologic data and relevant
medical information directly to the individual. Be it a genome
sequence on a tablet or the results of a biosensor for blood pres-
sure or another physiologic metric displayed on a smartphone,
the digital convergence with biology will definitively anchor the
individual as a source of salient data, the conduit of information
flow, and a—if not the—principal driver of medicine in the future.
The Human GIS
Perhaps the most commonly used geographic information
systems (GIS) are Google maps, which provide a layered
approach to data visualization, such as viewing a location via
satellite overlaid with street names, landmarks, and real-time
traffic data. This GIS exemplifies the concept of gathering and
transforming large bodies of data to provide exquisite temporal
and location information. With the multiple virtual views, it gives
one the sense of physically being on site. Although Google has
digitized and thus created a GIS for the Earth, it is now possible
to digitize a human being. As shown in Figure 1, there are multi-
ple layers of data that can now be obtained for any individual.
This includes data from biosensors, scanners, electronic medi-
cal records, social media, and the various omics that include
Cell 157, March 27, 2014 ª2014 Elsevier Inc. 241
countless hours of
context to the digit
DNA sequence, 2 T
transcriptome, and
first human GIS ca
feat and yielded k
individual. But, it ca
at this juncture. With
drop substantially,
automating the anal
ogy can readily be
providing meaningfu
The Omic Tools
Whole-Genome an
Perhaps the greates
domain has been t
sequence a human g
the pace of Moore’sFigure 1. Geographic Information System of a Human Being
Personalized Medicine
&
Evidence Based Medicine
근거 중심 의학에서 근거 수준이 높아질수록
개별 환자보다는 그룹으로 추상화되는 경향
개별 환자 대신, 환자 집단의 분포로 표현되고
분포 간의 통계적 유의성을 찾는 것이 목적
모든 가용한 다차원적 데이터를 바탕으로,
개별 환자의 특성에 맞는 치료를 찾을 수 있다면
N-of-One Trial?
N-of-One Medicine!
Data-driven Medicine에 대한 두 가지 전략
• top-down: 먼저 가설을 세우고, 그에 맞는 종류의 데이터를 모아서 검증해보자. 

• bottom-up: 일단 ‘모든’ 데이터를 최대한 많이 모아 놓으면, 뭐라도 큰 게 나오겠지.
• top-down: 먼저 가설을 세우고, 그에 맞는 종류의 데이터를 모아서 검증해보자. 

• bottom-up: 일단 ‘모든’ 데이터를 최대한 많이 모아 놓으면, 뭐라도 큰 게 나오겠지.
Data-driven Medicine에 대한 두 가지 전략
©2017NatureAmerica,Inc.,partofSpringerNature.Allrightsreserved.
NATURE BIOTECHNOLOGY ADVANCE ONLINE PUBLICATION 1
A RT I C L E S
In order to understand the basis of wellness and disease, we and
others have pursued a global and holistic approach termed ‘systems
medicine’1. The defining feature of systems medicine is the collec-
tion of diverse longitudinal data for each individual. These data sets
can be used to unravel the complexity of human biology and dis-
ease by assessing both genetic and environmental determinants of
health and their interactions. We refer to such data as personal, dense,
dynamic data clouds: personal, because each data cloud is unique to
an individual; dense, because of the high number of measurements;
and dynamic, because we monitor longitudinally. The convergence
of advances in systems medicine, big data analysis, individual meas-
urement devices, and consumer-activated social networks has led
to a vision of healthcare that is predictive, preventive, personalized,
and participatory (P4)2, also known as ‘precision medicine’. Personal,
dense, dynamic data clouds are indispensable to realizing this vision3.
The US healthcare system invests 97% of its resources on disease
care4, with little attention to wellness and disease prevention. Here
we investigate scientific wellness, which we define as a quantitative
data-informed approach to maintaining and improving health and
avoiding disease.
Several recent studies have illustrated the utility of multi-omic lon-
gitudinal data to look for signs of reversible early disease or disease
risk factors in single individuals. The dynamics of human gut and sali-
vary microbiota in response to travel abroad and enteric infection was
characterized in two individuals using daily stool and saliva samples5.
Daily multi-omic data collection from one individual over 14 months
identified signatures of respiratory infection and the onset of type 2
diabetes6. Crohn’s disease progression was tracked over many years
in one individual using regular blood and stool measurements7. Each
of these studies yielded insights into system dynamics even though
they had only one or two participants.
We report the generation and analysis of personal, dense, dynamic
data clouds for 108 individuals over the course of a 9-month study that
we call the Pioneer 100 Wellness Project (P100). Our study included
whole genome sequences; clinical tests, metabolomes, proteomes, and
microbiomes at 3-month intervals; and frequent activity measure-
ments (i.e., wearing a Fitbit). This study takes a different approach
from previous studies, in that a broad set of assays were carried out less
frequently in a (comparatively) large number of people. Furthermore,
we identified ‘actionable possibilities’ for each individual to enhance
her/his health. Risk factors that we observed in participants’ clinical
markers and genetics were used as a starting point to identify action-
able possibilities for behavioral coaching.
We report the correlations among different data types and identify
population-level changes in clinical markers. This project is the pilot
for the 100,000 (100K) person wellness project that we proposed
in 2014 (ref. 8). An increased scale of personal, dense, dynamic
data clouds in future holds the potential to improve our under-
standing of scientific wellness and delineate early warning signs for
human diseases.
RESULTS
The P100 study had four objectives. First, establish cost-efficient
procedures for generating, storing, and analyzing multiple sources
A wellness study of 108 individuals using personal,
dense, dynamic data clouds
Nathan D Price1,2,6,7, Andrew T Magis2,6, John C Earls2,6, Gustavo Glusman1 , Roie Levy1, Christopher Lausted1,
Daniel T McDonald1,5, Ulrike Kusebauch1, Christopher L Moss1, Yong Zhou1, Shizhen Qin1, Robert L Moritz1 ,
Kristin Brogaard2, Gilbert S Omenn1,3, Jennifer C Lovejoy1,2 & Leroy Hood1,4,7
Personal data for 108 individuals were collected during a 9-month period, including whole genome sequences; clinical tests,
metabolomes, proteomes, and microbiomes at three time points; and daily activity tracking. Using all of these data, we generated
a correlation network that revealed communities of related analytes associated with physiology and disease. Connectivity within
analyte communities enabled the identification of known and candidate biomarkers (e.g., gamma-glutamyltyrosine was densely
interconnected with clinical analytes for cardiometabolic disease). We calculated polygenic scores from genome-wide association
studies (GWAS) for 127 traits and diseases, and used these to discover molecular correlates of polygenic risk (e.g., genetic risk
for inflammatory bowel disease was negatively correlated with plasma cystine). Finally, behavioral coaching informed by personal
data helped participants to improve clinical biomarkers. Our results show that measurement of personal data clouds over time can
improve our understanding of health and disease, including early transitions to disease states.
1Institute for Systems Biology, Seattle, Washington, USA. 2Arivale, Seattle, Washington, USA. 3Department of Computational Medicine and Bioinformatics, University
of Michigan, Ann Arbor, Michigan, USA. 4Providence St. Joseph Health, Seattle, Washington, USA. 5Present address: University of California, San Diego, San Diego,
California, USA. 6These authors contributed equally to this work. 7These authors jointly supervised this work. Correspondence should be addressed to N.D.P.
(nathan.price@systemsbiology.org) or L.H. (lhood@systemsbiology.org).
Received 16 October 2016; accepted 11 April 2017; published online 17 July 2017; doi:10.1038/nbt.3870
Leroy Hood, MD, PhD (Institute for Systems Biology)
Pioneer 100 Wellness Project
(pilot of 100K person wellness project)
NatureAmerica,Inc.,partofSpringerNature.Allrightsreserved.
Intro
a
b
Round 1 Coaching sessions Round 2 Coaching sessions Round 3 Coaching sessions
Month 1 Month 2 Month 3 Month 4 Month 5 Month 6 Month 7 Month 8 Month 9
Clinical labs
Cardiovascular
HDL/LDL cholesterol, triglycerides,
particle profiles, and other markers
Blood sample
Metabolomics
Xenobiotics and metabolism-related
small molecules
Blood sample
Diabetes risk
Fasting glucose, HbA1c, insulin,
and other markers
Blood sample
Inflammation
IL-6, IL-8, and other markers
Blood sample
Nutrition and toxins
Ferritin, vitamin D, glutathione, mercury,
lead, and other markers
Blood sample
Genetics
Whole genome sequence
Blood sample
Proteomics
Inflammation, cardiovascular, liver,
brain, and heart-related proteins
Blood sample
Gut microbiome
16S rRNA sequencing
Stool sample
Quantified self
Daily activity
Activity tracker
Stress
Four-point cortisol
Saliva
모든 가용한 다차원적 데이터를 측정해보자
©2017NatureAmerica,Inc.,partofSpringerNature.Allrightsreserved. Proteomics
Genetic
traits
Microbiome
Coriobacteriia
Allergic sensitization
GH
NEMO
CD40L
REN
T PA
HSP 27
LEP
SIRT2
IL 6
FABP4
IL 1RA
EGF
VEGF
A
CSTB
BETA
NGF
PPBP(2)
PPBP
NCF2
4E
BP1
STAM
PB
SIRT2
CSF
1IL
6
FGF
21
IL
10RA
IL
18R1IL8IL7
TNFSF14
CCL20
FLT3L
CXCL10CD5HGFAXIN1
VEGFAOPGDNEROSM
APCSINHBCCRP(2)CRPCFHR1HGFAC
MBL2
SERPINC1
GC
PTGDS
ACTA2
ACTA2(2)
PDGF SUBUNIT B
Deletion Cfhr1
Inflammatory Bowel Disease
Activated Partial Thromboplastin Time
Bladder Cancer
Bilirubin Levels
Gamma Linolenic Acid
Dihomo gamma Linolenic Acid
Arachidonic Acid
Linoleic Acid
Adrenic Acid
Deltaproteobacteria
Mollicutes
Verrucomicrobiae
Coriobacteriales
Verrucomicrobiales
Verrucomicrobia
Coriobacteriaceae
91otu13421
91otu4418
91otu1825
M
ogibacteriaceae
Unclassified
Desulfovibrionaceae
Pasteurellaceae
Peptostreptococcaceae
Christensenellaceae
Verrucom
icrobiaceae
Alanine
RatioOm6Om3
AlphaAminoN
ButyricAcid
Interleukinll6
SmallLdlParticle
RatioGlnGln
Threonine
3Methylhistidine
AverageinflammationScore
Mercury
DocosapentaenoicAcidDocosatetraenoicAcid
EicosadienoicAcidHomalrLeucineOmega3indexTyrosine
HdlCholesterolCPeptide
1Methylhistidine
3HydroxyisovalericAcid
IsovalerylglycineIsoleucine
Figlu
TotalCholesterolLinoleicDihomoYLinolejc
PalmitoleicAcid
ArachidonicAcid
LdlParticle
ArachidonicEicosapentaenoic
Pasteurellales
Diversity
Tenericutes
Clinical labs
Metabolomics
5Hydroxyhexanoate
Tl16:0(palmiticAcid)
Tl18:3n6(gLinolenicAcid)Tl15:0(pentadecanoicAcid)Tl14:1n5(myristoleicAcid)Tl20:2n6(eicosadienoicAcid)Tl20:5n3(eicosapentaenoicAcid)
Tl18:2n6(linoleicAcid)
Tldm16:0(plasmalogenPalmiticAcid)
Tl22:6n3(docosahexaenoicAcid)
Tl22:4n6(adrenicAcid)
Tl18:1n9(oleicAcid)
Tldm18:1n9(plasmalogenOleicAcid)
Tl20:4n6(arachidonicAcid)
Tl14:0(myristicAcid)
Arachidate(20:0)
StearoylArachidonoylGlycerophosphoethanolamine(1)*
1Linoleoylglycerophosphocholine(18:2n6)
StearoylLinoleoylGlycerophosphoethanolamine(1)*
1Palmitoleoylglycerophosphocholine(16:1)*
PalmitoylOleoylGlycerophosphoglycerol(2)*
PalmitoylLinoleoylGlycerophosphocholine(1)*
Tl20:3n6(diHomoGLinoleicAcid)
2Hydroxypalmitate
NervonoylSphingomyelin*
Titl(totalTotalLipid)
Cholesterol
D
ocosahexaenoate
(dha;22;6n3)
Eicosapentaenoate
(epa; 20:5n3)
3
Carboxy
4
M
ethyl 5
Propyl 2
Furanpropanoate
(cm
pf)
3
M
ethyladipate
Cholate
Phosphoethanolamine
1 Oleoylglycerol (1 Monoolein)
Tigloylglycine
Valine
sobutyrylglycine
soleucine
eucine
P Cresol Glucuronide*
Phenylacetylglutamine
P Cresol Sulfate
Tyrosine
S Methylcysteine
Cystine
3 Methylhistidine
1 Methylhistidine
N Acetyltryptophan
3 Indoxyl Sulfate
Serotonin (5ht)
Creatinine
Glutamate
Cysteine Glutathione Disulfide
Gamma Glutamylthreonine*Gamma Glutamylalanine
Gamma Glutamylglutamate
Gamma Glutamylglutamine
Bradykinin, Hydroxy Pro(3)
Bradykinin, Des Arg(9)
BradykininMannoseBilirubin (e,e)*
Biliverdin
Bilirubin (z,z)
L UrobilinNicotinamide
Alpha TocopherolHippurate
Cinnam
oylglycine
Ldl Particle
N
um
ber
Triglycerides
Bilirubin
Direct
Alkaline
Phosphatase
EgfrNon
AfrAm
erican
CholesterolTotal
LdlSm
all
LdlM
edium
BilirubinTotal
Ggt
EgfrAfricanAmerican
Cystine
MargaricAcid
ElaidicAcid
Proinsulin
Hba1c
Insulin
Triglycerides
Ldlcholesterol
DihomoGammaLinolenicAcid
HsCrp
GlutamicAcid
Height
Weight
Leptin
BodyMasIndex
PhenylaceticAcid
Valine
TotalOmega3
TotalOmega6
HsCrpRelativeRisk
DocosahexaenoicAcid
AlphaAminoadipicAcid
EicosapentaenoicAcid
GammaAminobutyricAcid
5
Acetylam
ino
6
Form
ylam
ino
3
M
ethyluracil
Adenosine 5
Monophosphate (amp)
Gamma Glutamyltyrosine
Gamma Glutamyl 2 Aminobutyrate
N Acetyl 3 Methylhistidine*
3 Phenylpropionate (hydrocinnamate)
Figure 2 Top 100 correlations per pair of data types. Subset of top statistically significant Spearman inter-omic cross-sectional correlations between
all data sets collected in our cohort. Each line represents one correlation that was significant after adjustment for multiple hypothesis testing using the
method of Benjamini and Hochberg10 at padj < 0.05. The mean of all three time points was used to compute the correlations between analytes. Up to
100 correlations per pair of data types are shown in this figure. See Supplementary Figure 1 and Supplementary Table 2 for the complete inter-omic
cross-sectional network.
Nature Biotechnology 2017
측정한 모든 종류의 데이터들 중에 가장 correlation이 높은 100개의 pair를 선정
©2017NatureAmerica,Inc.,partofSpringerNature.Allrightsreserved.
A RT I C L E S
edges. The majority of edges involved a metabolite (3,309) or a clini-
cal laboratory test (3,366), with an additional 20 edges involving the
130 genetic traits tested, 46 with microbiome taxa or diversity score,
and 207 with quantified proteins. The inter-omic delta correlation
network contained 822 nodes and 2,406 edges. 375 of the edges in
the delta correlation network were also present in the cross-sectional
network. The cross-sectional correlation network is provided in
Supplementary Table 2 (inter-omic only) and Supplementary Table
3 (full). The delta correlation network is provided in Supplementary
Table 4 (inter-omic only) and Supplementary Table 5 (full).
We identified clusters of related measurements from the cross-
sectional inter-omic correlation network using community
analysis, an unsupervised (i.e., using unlabeled data to find hidden
structure) approach that iteratively prunes the network (removing
the edges with the highest betweenness) to reveal densely inter-
connected subgraphs (communities)11. Seventy communities of
at least two vertices (mean of 10.9 V and 34.9 E) were identi-
fied in the cross-sectional inter-omic network at the cutoff with
maximum community modularity12 (Supplementary Fig. 2),
and are fully visualized as an interactive graph in Cytoscape13
(Supplementary Dataset 1). 70% of the edges in the cross-sec-
tional network remained after community edge pruning. The
communities often represented a cluster of physiologically related
analytes, as described below.
Guanidinosuccinate
Alanine
IsovalerylcarnitineValine
NAcetylleucineNAcetylisoleucine2Methylbutyrylcarnitine(c5)
IsoleucineLeucine
SAdenosylhomocysteine(sah)
CysteineCystineMethionineSulfone
NAcetyltryptophan
NAcetylkynurenine(2)
3IndoxylSulfate
Xanthurenate
Kynurenine
Kynurenate
Tryptophan
Phenylalanine
N
Acetylphenylalanine
4Hydroxyphenylpyruvate
Phenylpyruvate
Tyrosine
N
Acetyltyrosine
Phenylacetylcarnitine
G
lutam
ine
G
lutam
ate
N
Acetylglycine
G
lycine
Proline
N
Delta
Acetylornithine
N
Acetylcitrulline
Hom
oarginine
N2,n5
Diacetylornithine
Pro Hydroxy Pro
2 Aminoadipate
Lysine
Deoxycholate
Ursodeoxycholate
Arachidate (20:0)
Nonadecanoate (19:0)
Palmitate (16:0)
Erucate (22:1n9)
Tl16:0 (palmitic Acid)
Tl16:1n7 (palmitoleic Acid)
Tl18:1n7 (avaccenic Acid)
Tl14:1n5 (myristoleic Acid)
Tl24:1n9 (nervonic Acid)
Tldm18:1n7 (plasmalogen Vaccenic Acid)
Tldm18:0 (plasmalogen Stearic Acid)
Tl14:0 (myristic Acid)
Tl18:2n6 (linoleic Acid)
Tldm16:0 (plasmalogen Palmitic Acid)
Tl22:1n9 (erucic Acid)
Tl20:3n6 (di Homo G Linoleic Acid)
Tl20:4n3 (eicosatetranoic Acid)
Tl18:1n9 (oleic Acid)
Tl18:3n3 (a Linolenic Acid)
Tldm18:1n9 (plasmalogen Oleic Acid)
1 Linoleoylglycerophosphocholine (18:2n6)
1 Linolenoylglycerophosphocholine (18:3n3)*
2 Stearoylglycerophosphocholine*1 Palmitoleoylglycerophosphocholine (16:1)*1 Oleoylglycerophosphocholine (18:1)3 Hydroxylaurate2 Hydroxydecanoate3 Hydroxydecanoate
3 Hydroxyoctanoate
2 Hydroxystearate
3 Hydroxysebacate7 Alpha Hydroxy 3 Oxo 4 Cholestenoate (7 Hoca)
CholesterolCarnitinePregnanediol 3 Glucuronide
Epiandrosterone Sulfate
Stearoylcarnitine
Myristoleoylcarnitine*
Decanoylcarnitine
Laurylcarnitine
2 Oleoylglycerol (2 Monoolein)
1 Linolenoylglycerol
1 Palmitoylglycerol (1 Monopalmitin)
1 Linoleoylglycerol (1 Monolinolein)
1 Dihomo Linolenylglycerol (alpha, Gamma)
1 Oleoylglycerol (1 Monoolein)
Caprate
(10:0)
Laurate
(12:0)
Caprylate
(8:0)
5
Dodecenoate
(12:1n7)
Palm
itoyl Sphingom
yelin
StearoylSphingom
yelin
Sphinganine
NervonoylSphingom
yelin*
Sphingosine
OleoylSphingom
yelin
3
Hydroxybutyrate
(bhba)
Acetoacetate
Butyrylcarnitine
Propionylcarnitine
DihomoLinolenate(20:3n3OrN6)
Hexanoylglycine
Glycerophosphoethanolamine
Tltl(totalTotalLipid)
Eicosanodioate
Octadecanedioate
3Methyladipate
2MethylmalonylCarnitine
PalmitoylEthanolamide
NOleoyltaurine
N1Methyl2Pyridone5Carboxamide
Nicotinamide
AlphaTocopherol
GammaTocopherol
Threonate
Oxalate(ethanedioate)
Ergothioneine
NAcetylalliin
Erythritol
Cinnamoylglycine
SAllylcysteine
2Pyrrolidinone
2Hydroxyisobutyrate
Tartronate(hydroxymalonate)
1,3,7Trimethylurate
4Hydroxycoumarin
2AcetamidophenolSulfate
4AcetylphenolSulfate
Mannose
Erythronate*
Pyruvate
Lactate
Glucose
Glycerate
Xylitol
GammaGlutamylleucine
GammaGlutamylphenylalanine
Gam
m
a
Glutam
ylisoleucine*
Gam
m
a
Glutam
ylglutam
ine
Gam
m
a
Glutam
ylhistidine
G
am
m
a
G
lutam
ylglutam
ate
Bradykinin,Hydroxy
Pro(3)
G
lycylleucine
Succinylcarnitine
Succinate
Fum
arate
M
alate
Alpha Ketoglutarate
Citrate
Xanthine
LDL Particle
hs-CRP Relative Risk
ProinsulinHba1cInsulin
Gamma Linolenic Acid
Triglycerides
Manganese
Dihomo Gamma Linolenic Acid
Glutamic AcidLeptin
Body Mass Index
Total LC Omega9TryptophanLysineVitamin D
5 Hydroxyindoeacetic AcidWeightLactic Acid
Linoleic Dihomo Y LinoleioIsovalerylglycineQuinolinic Acid
C-PeptideHDL Cholesterol
Indoleacetic Acid
Adiponectin
Phenylalanine
Interleukin IL6
Small LDL Particle
Ratio Asn Asp
HOMA-IR
Lignoceric Acid
Succinic Acid
Homogentisic Acid
Homovanillic Acid
Average Inflammation Score
FIGLU
Ratio Gln Gln
Magnesium
Pyroglutamic Acid
Glucose
Gondoic Acid
Kynurenic Quinolinic Ratio
Alpha Amino N Butyric Acid
Tyrosine
Alanine
HDL Large
GGT
Triglycerides
Bilirubin Direct
LDL Medium
LDL Pattern
Alkaline Phosphatase
LDL Peak Size
Chloride
Glucose
LDL
Particle
Num
ber
LDL
Sm
all
Ferritin
CCL19
H
G
F
IL
10RAIL
6
CXCL10
TNFSF14
CCL20CD5
CD40
VEGF
A
IL18R1OSM
CRPF9(2)
APCSINHBCCRP(2)
MBL2(2)MBL2GC
F9
SERPINC1
TPALEPVEGFAVEGFD
IL6FABP4CSTBIL1RA
Pasteurellales
Pasteurellaceae
Omega6FattyAcidLevels(DGLA)
FG
F
21
hs-CRP
G
am
m
a
G
lutam
yltyrosine
Amino acid
metabolism
Olink
(CVD)
Olink
(inflammation)
Quest
diagnostics
Genova
diagnostics
Nucleotides
Energy
Peptides
Carbohydrates
Xenobiotics
Vitamins and
cofactors
Lipid
metabolism
SRM (liver)
Metabolites
Clinical labs
Microbiome
Genetic traits
Proteins
Figure 3 Cardiometabolic community. All vertices and edges of the cardiometabolic community, with lines indicating significant ( adj < 0.05)
correlations. Associations with FGF21 (red lines) and gamma-glutamyltyrosine (purple lines) are highlighted.
• inter-omics correlation network 의 분석을 통해서 환자들을 몇가지 cluster로 분류

• 가장 큰 cluster (246 Vertices, 1645 Edges): Cardiometaboic Health 

• four most connected clinical analyses: C-peptide, insulin, HOMA-IR, triglycerides

• four most-connected proteins: leptin, C-reactive protein, FGF21, INHBC
gamma-glutamyltyrosine
FGF21
• inter-omics correlation network 의 분석을 통해서 환자들을 몇가지 cluster로 분류

• 가장 큰 cluster (246 Vertices, 1645 Edges): Cardiometaboic Health 

• four most connected clinical analyses: C-peptide, insulin, MOMA-IR, triglycerides

• four most-connected proteins: leptin, C-reactive protein, FGF21, INHBC
atureAmerica,Inc.,partofSpringerNature.Allrightsreserved.
A RT I C L E S
The largest community (246 V; 1,645 E) contains many clinical
analytes associated with cardiometabolic health, such as C-peptide,
triglycerides, insulin, homeostatic risk assessment–insulin resistance
(HOMA-IR), fasting glucose, high-density lipid (HDL) cholesterol,
and small low-density lipid (LDL) particle number (Fig. 3). The four
most-connected clinical analytes by degree (the number of edges
connecting a particular analyte) were C-peptide (degree 99), insulin
(88), HOMA-IR (88), and triglycerides (75). The four most-connected
proteins measured using targeted (i.e., selected reaction monitoring
analysis) mass spectrometry or Olink proximity extension assays
by degree are leptin (18), C-reactive protein (15), fibroblast growth
factor 21 (FGF21) (14), and inhibin beta C chain (INHBC) (10).
Leptin and C-reactive protein are indicators for cardiovascular
risk14,15. FGF21 is positively correlated with the clinical analytes
( = −0.41; padj = 2.1 × 10−3). Hypothyroidism has long been recog-
nized clinically as a cause of elevated cholesterol values19.
A community formed around plasma serotonin (18 V; 25 E) contain-
ing 12 proteins listed in Supplementary Table 6, for which the most
significant enrichment identified in a STRING ontology analysis20 was
platelet activation (padj = 1.7 × 10−3) (Fig. 4b). Serotonin is known to
induce platelet aggregation21; accordingly, selective serotonin reuptake
inhibitors (SSRIs) may protect against myocardial infarction22.
We identified several communities containing microbiome taxa,
suggesting that there are specific microbiome–analyte relationships.
Hydrocinnamate, l-urobilin, and 5-hydroxyhexanoate clustered with
the bacterial class Mollicutes and family Christensenellaceae (8 V;
8 E). Another community emerged around the Verrucomicrobiaceae
and Desulfovibrionaceae families and p-cresol-sulfate (7 V; 6 E). The
a
c
d
b
e
Figure 4 Cholesterol, serotonin, -diversity, IBD, and bladder cancer communities. (a) Cholesterol community. (b) Serotonin community. (c) -diversity
community. (d) The polygenic score for inflammatory bowel disease is negatively correlated with cystine. (e) The polygenic score for bladder cancer is
positively correlated with 5-acetylamino-6-formylamino-3-methyluracil (AFMU).
Cholesterol, serotonin, diversity, IBD, and bladder cancer communities. (a) Cholesterol community. (b) Serotonin community. (c)
-diversity community. (d) The polygenic score for inflammatory bowel disease is negatively correlated with cystine. (e) The
polygenic score for bladder cancer is positively correlated with 5-acetylamino-6-formylamino-3-methyluracil (AFMU).
017NatureAmerica,Inc.,partofSpringerNature.Allrightsreserved.
identified with elevated fasting glucose or HbA1c at baseline (pre-
diabetes), the coach made recommendations based on the Diabetes
Prevention Program36, customized for each person’s lifestyle. These
individual recommendations typically fell into one of several major
factors (fasting insulin and HOMA-IR), and inflammation (IL-8 and
TNF-alpha). Lipoprotein fractionation, performed by both labora-
tory companies, produced significant but discordant results for LDL
particle number. We observed significant improvements in fasting
Table 1 Longitudinal analysis of clinical changes by round
Clinical laboratory test Changes in labs in participants out-of-range at baseline
Health area Name N per round P-value
Nutrition Vitamin D 95 +7.2 ng/mL/round 7.1 × 10−25
Nutrition Mercury 81 −0.002 mcg/g/round 8.9 × 10−9
Diabetes HbA1c 52 −0.085%/round 9.2 × 10−6
Cardiovascular LDL particle number (Quest) 30 +130 nmol/L/round 9.3 × 10−5
Nutrition Methylmalonic acid (Genova) 3 −0.49 mmol/mol creatinine/round 2.1 × 10−4
Cardiovascular LDL pattern (A or B) 28 −0.16 /round 4.8 × 10−4
Inflammation Interleukin-8 10 −6.1 pg/mL/round 5.9 × 10−4
Cardiovascular Total cholesterol (Quest) 48 −6.4 mg/dL/round 7.2 × 10−4
Cardiovascular LDL cholesterol 57 −4.8 mg/dL/round 8.8 × 10−4
Cardiovascular LDL particle number (Genova) 70 −69 nmol/L/round 1.2 × 10−3
Cardiovascular Small LDL particle number (Genova) 73 −56 nmol/L/round 3.5 × 10−3
Diabetes Fasting glucose (Quest) 45 −1.9 mg/dL/round 8.2 × 10−3
Cardiovascular Total cholesterol (Genova) 43 −5.4 mg/dL/round 1.2 × 10−2
Diabetes Insulin 16 −2.3 IU/mL/round 1.5 × 10−2
Inflammation TNF-alpha 4 −6.6 pg/mL/round 1.8 × 10−2
Diabetes HOMA-IR 19 −0.56 /round 2.0 × 10−2
Cardiovascular HDL cholesterol 5 +4.5 mg/dL/round 2.2 × 10−2
Nutrition Methylmalonic acid (Quest) 7 −42 nmol/L/round 5.2 × 10−2
Cardiovascular Triglycerides (Genova) 14 −18 mg/dL/round 1.4 × 10−1
Diabetes Fasting glucose (Genova) 47 −0.98 mg/dL/round 1.5 × 10−1
Nutrition Arachidonic acid 35 +0.24 wt%/round 1.9 × 10−1
Inflammation hs-CRP 51 −0.47 mcg/mL/round 2.1 × 10−1
Cardiovascular Triglycerides (Quest) 17 −14 mg/dL/round 2.4 × 10−1
Nutrition Glutathione 6 +11 micromol/L/round 2.5 × 10−1
Nutrition Zinc 4 −0.82 mcg/g/round 3.0 × 10−1
Nutrition Ferritin 10 −14 ng/mL/round 3.1 × 10−1
Inflammation Interleukin-6 4 −1.1 pg/mL/round 3.8 × 10−1
Cardiovascular HDL large particle number 8 +210 nmol/L/round 4.9 × 10−1
Nutrition Copper 10 +0.006 mcg/g/round 6.0 × 10−1
Nutrition Selenium 6 +0.035 mcg/g/round 6.2 × 10−1
Cardiovascular Medium LDL particle number 20 +2.8 nmol/L/round 8.5 × 10−1
Cardiovascular Small LDL particle number (Quest) 14 −2.3 nmol/L/round 8.8 × 10−1
Nutrition Manganese 0 N/A N/A
Nutrition EPA 0 N/A N/A
Nutrition DHA 0 N/A N/A
Generalized estimating equations (GEE) were used to calculate average changes in clinical laboratory tests over time, for those analytes that were actively coached on. The ‘ per
round’ column is the average change in the population for that analyte by round adjusted for age, sex, and self-reported ancestry. ‘Out-of-range at baseline’ indicates the average
change using only those participants who were out-of-range for that analyte at the beginning of the study. Rows in boldface indicate statistically significant improvement, while
the italicized row indicates statistically significant worsening. N/A values are present where no participants were out-of-range at baseline. For example, the average improvement
in vitamin D for the 95 participants that began the study out-of-range was +7.2 ng/mL per round. Several analytes are measured by both Quest and Genova; with the exception of
LDL particle number, the direction of effect for significantly changed analytes was concordant across the two laboratories. An independence working correlation structure was used
in the GEE. See Supplementary Table 10 for the complete results.
• 수치가 정상 범위를 벗어나면 코치가 개입하여, 해당 수치를 개선할 수 있는 라이프스타일의 변화 유도

• 예를 들어, 공복혈당 혹은 HbA1c 의 증가: 코치가 당뇨 예방 프로그램(DPP)을 권고

• 몇개의 major category로 나눠짐

• diet, exercise, stress management, dietary supplements, physician referral 

• 이를 통해서 가장 크게 개선 효과가 있었던 수치들

• vitamin D, mercury, HbA1c

• 전반적으로 콜레스테롤 관련 수치나, 당뇨 위험 관련 수치, 염증 수치 등에 개선이 있었음
• 버릴리(구글)의 베이스라인 프로젝트

• 건강과 질병을 새롭게 정의하기 위한 프로젝트

• 4년 동안 만 명의 개인의 건강 상태를 면밀하게 추적하여 데이터를 축적

• 심박수와 수면패턴 및 유전 정보, 감정 상태, 진료기록, 가족력, 소변/타액/혈액 검사 등
• 버릴리(구글)의 베이스라인 프로젝트

• 건강과 질병을 새롭게 정의하기 위한 프로젝트

• 4년 동안 만 명의 개인의 건강 상태를 면밀하게 추적하여 데이터를 축적

• 심박수와 수면패턴 및 유전 정보, 감정 상태, 진료기록, 가족력, 소변/타액/혈액 검사 등
• 버릴리의 ‘Study Watch’

• 2017년 4월 공개한 베이스라인 스터디 용 스마트워치

• 심전도, 심박수, EDA(Electrodermal Activity), 관성움직임(inertial movement) 등 측정

• 장기간 추적연구를 위한 특징들: 배터리 수명(일주일), 데이터 저장 공간, 동기화 (일주일 한 번)
• Linda Avey의 Precise.ly

• 23andMe의 공동창업자였던 Linda Avey가 2009년 회사를 떠난 이후, 2011년 창업

• ‘We Are Curious’ 라는 이름에서 최근에 Precise.ly로 회사 이름 변경
• Linda Avey의 Precise.ly

• Genotype + Phenotype + Microbiome + environment 모두 결합하여 의학적인 insight

• Genotype: Helix의 플랫폼에서 WES 을 통하여 분석 

• Phenotype: 웨어러블, IoT 기기를 이용
• ‘Modern diseases’를 주로 타게팅 하겠다고 언급하고 있음

• 예를 들어, autism spectrum syndrome을 다차원적 데이터를 기반으로 분류할 수 있을까?

• Helix 플랫폼을 통해서 먼저 Chronic Fatigue 에 대한 앱을 먼저 출시하고, 

• 향후 autism, PD 등에 대한 앱을 출시할 예정이라고 함.
iCarbonX

•중국 BGI의 대표였던 준왕이 창업

•'모든 데이터를 측정'하고 이를 정밀 의료에 활용할 계획

•데이터를 측정할 수 있는 역량을 가진 회사에 투자 및 인수

•SomaLogic, HealthTell, PatientsLikMe

•향후 5년 동안 100만명-1000만 명의 데이터 모을 계획

•이 데이터의 분석은 인공지능으로
현재 Arivale, Baseline Project,
Precisely, iCarbon-X 가
모두 잘 되고 있지는 않으나,
이러한 변화의 초창기 시도 정도로 해석 가능
의료 인공지능
피할 수 없는 미래
그래서 그 많은 데이터, 어떡할 건데?
좋은 질문을 던저야, 좋은 답이 나온다
좋은 질문을 던저야, 좋은 답이 나온다
Martin Duggan,“IBM Watson Health - Integrated Care & the Evolution to Cognitive Computing”
지금 의대생과 전공의는 무엇을 배우나
•복잡한 의료 데이터의 분석 및 insight 도출

•영상 의료/병리 데이터의 분석/판독

•연속 데이터의 모니터링 및 예방/예측
의료 인공지능의 세 유형
•복잡한 의료 데이터의 분석 및 insight 도출

•영상 의료/병리 데이터의 분석/판독

•연속 데이터의 모니터링 및 예방/예측
의료 인공지능의 세 유형
Jeopardy!
2011년 인간 챔피언 두 명 과 퀴즈 대결을 벌여서 압도적인 우승을 차지
ORIGINAL ARTICLE
Watson for Oncology and breast cancer treatment
recommendations: agreement with an expert
multidisciplinary tumor board
S. P. Somashekhar1*, M.-J. Sepu´lveda2
, S. Puglielli3
, A. D. Norden3
, E. H. Shortliffe4
, C. Rohit Kumar1
,
A. Rauthan1
, N. Arun Kumar1
, P. Patil1
, K. Rhee3
& Y. Ramya1
1
Manipal Comprehensive Cancer Centre, Manipal Hospital, Bangalore, India; 2
IBM Research (Retired), Yorktown Heights; 3
Watson Health, IBM Corporation,
Cambridge; 4
Department of Surgical Oncology, College of Health Solutions, Arizona State University, Phoenix, USA
*Correspondence to: Prof. Sampige Prasannakumar Somashekhar, Manipal Comprehensive Cancer Centre, Manipal Hospital, Old Airport Road, Bangalore 560017, Karnataka,
India. Tel: þ91-9845712012; Fax: þ91-80-2502-3759; E-mail: somashekhar.sp@manipalhospitals.com
Background: Breast cancer oncologists are challenged to personalize care with rapidly changing scientific evidence, drug
approvals, and treatment guidelines. Artificial intelligence (AI) clinical decision-support systems (CDSSs) have the potential to
help address this challenge. We report here the results of examining the level of agreement (concordance) between treatment
recommendations made by the AI CDSS Watson for Oncology (WFO) and a multidisciplinary tumor board for breast cancer.
Patients and methods: Treatment recommendations were provided for 638 breast cancers between 2014 and 2016 at the
Manipal Comprehensive Cancer Center, Bengaluru, India. WFO provided treatment recommendations for the identical cases in
2016. A blinded second review was carried out by the center’s tumor board in 2016 for all cases in which there was not
agreement, to account for treatments and guidelines not available before 2016. Treatment recommendations were considered
concordant if the tumor board recommendations were designated ‘recommended’ or ‘for consideration’ by WFO.
Results: Treatment concordance between WFO and the multidisciplinary tumor board occurred in 93% of breast cancer cases.
Subgroup analysis found that patients with stage I or IV disease were less likely to be concordant than patients with stage II or III
disease. Increasing age was found to have a major impact on concordance. Concordance declined significantly (P 0.02;
P < 0.001) in all age groups compared with patients <45 years of age, except for the age group 55–64 years. Receptor status
was not found to affect concordance.
Conclusion: Treatment recommendations made by WFO and the tumor board were highly concordant for breast cancer cases
examined. Breast cancer stage and patient age had significant influence on concordance, while receptor status alone did not.
This study demonstrates that the AI clinical decision-support system WFO may be a helpful tool for breast cancer treatment
decision making, especially at centers where expert breast cancer resources are limited.
Key words: Watson for Oncology, artificial intelligence, cognitive clinical decision-support systems, breast cancer,
concordance, multidisciplinary tumor board
Introduction
Oncologists who treat breast cancer are challenged by a large and
rapidly expanding knowledge base [1, 2]. As of October 2017, for
example, there were 69 FDA-approved drugs for the treatment of
breast cancer, not including combination treatment regimens
[3]. The growth of massive genetic and clinical databases, along
with computing systems to exploit them, will accelerate the speed
of breast cancer treatment advances and shorten the cycle time
for changes to breast cancer treatment guidelines [4, 5]. In add-
ition, these information management challenges in cancer care
are occurring in a practice environment where there is little time
available for tracking and accessing relevant information at the
point of care [6]. For example, a study that surveyed 1117 oncolo-
gists reported that on average 4.6 h per week were spent keeping
VC The Author(s) 2018. Published by Oxford University Press on behalf of the European Society for Medical Oncology.
All rights reserved. For permissions, please email: journals.permissions@oup.com.
Annals of Oncology 29: 418–423, 2018
doi:10.1093/annonc/mdx781
Published online 9 January 2018
Downloaded from https://academic.oup.com/annonc/article-abstract/29/2/418/4781689
by guest
WFO는 현재 정확성, 효용성에 대한
근거가 부족하지만,
10년 뒤에도 그러할까?
ORIGINAL ARTICLE
Watson for Oncology and breast cancer treatment
recommendations: agreement with an expert
multidisciplinary tumor board
S. P. Somashekhar1*, M.-J. Sepu´lveda2
, S. Puglielli3
, A. D. Norden3
, E. H. Shortliffe4
, C. Rohit Kumar1
,
A. Rauthan1
, N. Arun Kumar1
, P. Patil1
, K. Rhee3
& Y. Ramya1
1
Manipal Comprehensive Cancer Centre, Manipal Hospital, Bangalore, India; 2
IBM Research (Retired), Yorktown Heights; 3
Watson Health, IBM Corporation,
Cambridge; 4
Department of Surgical Oncology, College of Health Solutions, Arizona State University, Phoenix, USA
*Correspondence to: Prof. Sampige Prasannakumar Somashekhar, Manipal Comprehensive Cancer Centre, Manipal Hospital, Old Airport Road, Bangalore 560017, Karnataka,
India. Tel: þ91-9845712012; Fax: þ91-80-2502-3759; E-mail: somashekhar.sp@manipalhospitals.com
Background: Breast cancer oncologists are challenged to personalize care with rapidly changing scientific evidence, drug
approvals, and treatment guidelines. Artificial intelligence (AI) clinical decision-support systems (CDSSs) have the potential to
help address this challenge. We report here the results of examining the level of agreement (concordance) between treatment
recommendations made by the AI CDSS Watson for Oncology (WFO) and a multidisciplinary tumor board for breast cancer.
Patients and methods: Treatment recommendations were provided for 638 breast cancers between 2014 and 2016 at the
Manipal Comprehensive Cancer Center, Bengaluru, India. WFO provided treatment recommendations for the identical cases in
2016. A blinded second review was carried out by the center’s tumor board in 2016 for all cases in which there was not
agreement, to account for treatments and guidelines not available before 2016. Treatment recommendations were considered
concordant if the tumor board recommendations were designated ‘recommended’ or ‘for consideration’ by WFO.
Results: Treatment concordance between WFO and the multidisciplinary tumor board occurred in 93% of breast cancer cases.
Subgroup analysis found that patients with stage I or IV disease were less likely to be concordant than patients with stage II or III
disease. Increasing age was found to have a major impact on concordance. Concordance declined significantly (P 0.02;
P < 0.001) in all age groups compared with patients <45 years of age, except for the age group 55–64 years. Receptor status
was not found to affect concordance.
Conclusion: Treatment recommendations made by WFO and the tumor board were highly concordant for breast cancer cases
examined. Breast cancer stage and patient age had significant influence on concordance, while receptor status alone did not.
This study demonstrates that the AI clinical decision-support system WFO may be a helpful tool for breast cancer treatment
decision making, especially at centers where expert breast cancer resources are limited.
Key words: Watson for Oncology, artificial intelligence, cognitive clinical decision-support systems, breast cancer,
concordance, multidisciplinary tumor board
Introduction
Oncologists who treat breast cancer are challenged by a large and
rapidly expanding knowledge base [1, 2]. As of October 2017, for
example, there were 69 FDA-approved drugs for the treatment of
breast cancer, not including combination treatment regimens
[3]. The growth of massive genetic and clinical databases, along
with computing systems to exploit them, will accelerate the speed
of breast cancer treatment advances and shorten the cycle time
for changes to breast cancer treatment guidelines [4, 5]. In add-
ition, these information management challenges in cancer care
are occurring in a practice environment where there is little time
available for tracking and accessing relevant information at the
point of care [6]. For example, a study that surveyed 1117 oncolo-
gists reported that on average 4.6 h per week were spent keeping
VC The Author(s) 2018. Published by Oxford University Press on behalf of the European Society for Medical Oncology.
All rights reserved. For permissions, please email: journals.permissions@oup.com.
Annals of Oncology 29: 418–423, 2018
doi:10.1093/annonc/mdx781
Published online 9 January 2018
Downloaded from https://academic.oup.com/annonc/article-abstract/29/2/418/4781689
by guest
Table 2. MMDT and WFO recommendations after the initial and blinded second reviews
Review of breast cancer cases (N 5 638) Concordant cases, n (%) Non-concordant cases, n (%)
Recommended For consideration Total Not recommended Not available Total
Initial review (T1MMDT versus T2WFO) 296 (46) 167 (26) 463 (73) 137 (21) 38 (6) 175 (27)
Second review (T2MMDT versus T2WFO) 397 (62) 194 (30) 591 (93) 36 (5) 11 (2) 47 (7)
T1MMDT, original MMDT recommendation from 2014 to 2016; T2WFO, WFO advisor treatment recommendation in 2016; T2MMDT, MMDT treatment recom-
mendation in 2016; MMDT, Manipal multidisciplinary tumor board; WFO, Watson for Oncology.
31%
18%
1% 2% 33%
5% 31%
6%
0% 10% 20%
Not available Not recommended RecommendedFor consideration
30% 40% 50% 60% 70% 80% 90% 100%
8% 25% 61%
64%
64%
29% 51%
62%
Concordance, 93%
Concordance, 80%
Concordance, 97%
Concordance, 95%
Concordance, 86%
2%
2%
Overall
(n=638)
Stage I
(n=61)
Stage II
(n=262)
Stage III
(n=191)
Stage IV
(n=124)
5%
Figure 1. Treatment concordance between WFO and the MMDT overall and by stage. MMDT, Manipal multidisciplinary tumor board; WFO,
Watson for Oncology.
5%Non-metastatic
HR(+)HER2/neu(+)Triple(–)
Metastatic
Non-metastatic
Metastatic
Non-metastatic
Metastatic
10%
1%
2%
1% 5% 20%
20%10%
0%
Not applicable Not recommended For consideration Recommended
20% 40% 60% 80% 100%
5%
74%
65%
34% 64%
5% 38% 56%
15% 20% 55%
36% 59%
Concordance, 95%
Concordance, 75%
Concordance, 94%
Concordance, 98%
Concordance, 94%
Concordance, 85%
Figure 2. Treatment concordance between WFO and the MMDT by stage and receptor status. HER2/neu, human epidermal growth factor
receptor 2; HR, hormone receptor; MMDT, Manipal multidisciplinary tumor board; WFO, Watson for Oncology.
Annals of Oncology Original article
WFO는 현재 정확성, 효용성에 대한
근거가 부족하지만,
10년 뒤에도 그러할까?
IBM Watson Health
Watson for Clinical Trial Matching (CTM)
18
1. According to the National Comprehensive Cancer Network (NCCN)
2. http://csdd.tufts.edu/files/uploads/02_-_jan_15,_2013_-_recruitment-retention.pdf© 2015 International Business Machines Corporation
Searching across
eligibility criteria of clinical
trials is time consuming
and labor intensive
Current
Challenges
Fewer than 5% of
adult cancer patients
participate in clinical
trials1
37% of sites fail to meet
minimum enrollment
targets. 11% of sites fail
to enroll a single patient 2
The Watson solution
• Uses structured and unstructured
patient data to quickly check
eligibility across relevant clinical
trials
• Provides eligible trial
considerations ranked by
relevance
• Increases speed to qualify
patients
Clinical Investigators
(Opportunity)
• Trials to Patient: Perform
feasibility analysis for a trial
• Identify sites with most
potential for patient enrollment
• Optimize inclusion/exclusion
criteria in protocols
Faster, more efficient
recruitment strategies,
better designed protocols
Point of Care
(Offering)
• Patient to Trials:
Quickly find the
right trial that a
patient might be
eligible for
amongst 100s of
open trials
available
Improve patient care
quality, consistency,
increased efficiencyIBM Confidential
•“향후 10년 동안 첫번째 cardiovascular event 가 올 것인가” 예측
•전향적 코호트 스터디: 영국 환자 378,256 명
•일상적 의료 데이터를 바탕으로 기계학습으로 질병을 예측하는 첫번째 대규모 스터디
•기존의 ACC/AHA 가이드라인과 4가지 기계학습 알고리즘의 정확도를 비교
•Random forest; Logistic regression; Gradient bossting; Neural network
•2018년 1월 구글이 전자의무기록(EMR)을 분석하여, 환자 치료 결과를 예측하는 인공지능 발표

•환자가 입원 중에 사망할 것인지

•장기간 입원할 것인지

•퇴원 후에 30일 내에 재입원할 것인지

•퇴원 시의 진단명

•이번 연구의 특징: 확장성

•과거 다른 연구와 달리 EMR의 일부 데이터를 pre-processing 하지 않고,

•전체 EMR 를 통채로 모두 분석하였음: UCSF, UCM (시카고 대학병원)

•특히, 비정형 데이터인 의사의 진료 노트도 분석
ARTICLE OPEN
Scalable and accurate deep learning with electronic health
records
Alvin Rajkomar 1,2
, Eyal Oren1
, Kai Chen1
, Andrew M. Dai1
, Nissan Hajaj1
, Michaela Hardt1
, Peter J. Liu1
, Xiaobing Liu1
, Jake Marcus1
,
Mimi Sun1
, Patrik Sundberg1
, Hector Yee1
, Kun Zhang1
, Yi Zhang1
, Gerardo Flores1
, Gavin E. Duggan1
, Jamie Irvine1
, Quoc Le1
,
Kurt Litsch1
, Alexander Mossin1
, Justin Tansuwan1
, De Wang1
, James Wexler1
, Jimbo Wilson1
, Dana Ludwig2
, Samuel L. Volchenboum3
,
Katherine Chou1
, Michael Pearson1
, Srinivasan Madabushi1
, Nigam H. Shah4
, Atul J. Butte2
, Michael D. Howell1
, Claire Cui1
,
Greg S. Corrado1
and Jeffrey Dean1
Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare
quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR
data, a labor-intensive process that discards the vast majority of information in each patient’s record. We propose a representation
of patients’ entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that
deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple
centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two US academic
medical centers with 216,221 adult patients hospitalized for at least 24 h. In the sequential format we propose, this volume of EHR
data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for
tasks such as predicting: in-hospital mortality (area under the receiver operator curve [AUROC] across sites 0.93–0.94), 30-day
unplanned readmission (AUROC 0.75–0.76), prolonged length of stay (AUROC 0.85–0.86), and all of a patient’s final discharge
diagnoses (frequency-weighted AUROC 0.90). These models outperformed traditional, clinically-used predictive models in all cases.
We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios. In a case
study of a particular prediction, we demonstrate that neural networks can be used to identify relevant information from the
patient’s chart.
npj Digital Medicine (2018)1:18 ; doi:10.1038/s41746-018-0029-1
INTRODUCTION
The promise of digital medicine stems in part from the hope that,
by digitizing health data, we might more easily leverage computer
information systems to understand and improve care. In fact,
routinely collected patient healthcare data are now approaching
the genomic scale in volume and complexity.1
Unfortunately,
most of this information is not yet used in the sorts of predictive
statistical models clinicians might use to improve care delivery. It
is widely suspected that use of such efforts, if successful, could
provide major benefits not only for patient safety and quality but
also in reducing healthcare costs.2–6
In spite of the richness and potential of available data, scaling
the development of predictive models is difficult because, for
traditional predictive modeling techniques, each outcome to be
predicted requires the creation of a custom dataset with specific
variables.7
It is widely held that 80% of the effort in an analytic
model is preprocessing, merging, customizing, and cleaning
nurses, and other providers are included. Traditional modeling
approaches have dealt with this complexity simply by choosing a
very limited number of commonly collected variables to consider.7
This is problematic because the resulting models may produce
imprecise predictions: false-positive predictions can overwhelm
physicians, nurses, and other providers with false alarms and
concomitant alert fatigue,10
which the Joint Commission identified
as a national patient safety priority in 2014.11
False-negative
predictions can miss significant numbers of clinically important
events, leading to poor clinical outcomes.11,12
Incorporating the
entire EHR, including clinicians’ free-text notes, offers some hope
of overcoming these shortcomings but is unwieldy for most
predictive modeling techniques.
Recent developments in deep learning and artificial neural
networks may allow us to address many of these challenges and
unlock the information in the EHR. Deep learning emerged as the
preferred machine learning approach in machine perception
www.nature.com/npjdigitalmed
•2018년 1월 구글이 전자의무기록(EMR)을 분석하여, 환자 치료 결과를 예측하는 인공지능 발표

•환자가 입원 중에 사망할 것인지

•장기간 입원할 것인지

•퇴원 후에 30일 내에 재입원할 것인지

•퇴원 시의 진단명

•이번 연구의 특징: 확장성

•과거 다른 연구와 달리 EMR의 일부 데이터를 pre-processing 하지 않고,

•전체 EMR 를 통채로 모두 분석하였음: UCSF, UCM (시카고 대학병원)

•특히, 비정형 데이터인 의사의 진료 노트도 분석
• 복잡한 의료 데이터의 분석 및 insight 도출
• 영상 의료/병리 데이터의 분석/판독
• 연속 데이터의 모니터링 및 예방/예측
의료 인공지능의 세 유형
Deep Learning
http://theanalyticsstore.ie/deep-learning/
Radiologist
Detection of Diabetic Retinopathy
Skin Cancer
Digital Pathologist
http://www.rolls-royce.com/about/our-technology/enabling-technologies/engine-health-management.aspx#sense
250 sensors to monitor the “health” of the GE turbines
Fig 1. What can consumer wearables do? Heart rate can be measured with an oximeter built into a ring [3], muscle activity with an electromyographi
sensor embedded into clothing [4], stress with an electodermal sensor incorporated into a wristband [5], and physical activity or sleep patterns via an
accelerometer in a watch [6,7]. In addition, a female’s most fertile period can be identified with detailed body temperature tracking [8], while levels of me
attention can be monitored with a small number of non-gelled electroencephalogram (EEG) electrodes [9]. Levels of social interaction (also known to a
PLOS Medicine 2016
• 복잡한 의료 데이터의 분석 및 insight 도출
• 영상 의료/병리 데이터의 분석/판독
• 연속 데이터의 모니터링 및 예방/예측
인공지능의 의료 활용
Project Artemis at UIOT
Sugar.IQ
사용자의 음식 섭취와 그에 따른 혈당 변화,
인슐린 주입 등의 과거 기록 기반
식후 사용자의 혈당이 어떻게 변화할지
Watson 이 예측
An Algorithm Based on Deep Learning for Predicting In-Hospital
Cardiac Arrest
Joon-myoung Kwon, MD;* Youngnam Lee, MS;* Yeha Lee, PhD; Seungwoo Lee, BS; Jinsik Park, MD, PhD
Background-—In-hospital cardiac arrest is a major burden to public health, which affects patient safety. Although traditional track-
and-trigger systems are used to predict cardiac arrest early, they have limitations, with low sensitivity and high false-alarm rates.
We propose a deep learning–based early warning system that shows higher performance than the existing track-and-trigger
systems.
Methods and Results-—This retrospective cohort study reviewed patients who were admitted to 2 hospitals from June 2010 to July
2017. A total of 52 131 patients were included. Specifically, a recurrent neural network was trained using data from June 2010 to
January 2017. The result was tested using the data from February to July 2017. The primary outcome was cardiac arrest, and the
secondary outcome was death without attempted resuscitation. As comparative measures, we used the area under the receiver
operating characteristic curve (AUROC), the area under the precision–recall curve (AUPRC), and the net reclassification index.
Furthermore, we evaluated sensitivity while varying the number of alarms. The deep learning–based early warning system (AUROC:
0.850; AUPRC: 0.044) significantly outperformed a modified early warning score (AUROC: 0.603; AUPRC: 0.003), a random forest
algorithm (AUROC: 0.780; AUPRC: 0.014), and logistic regression (AUROC: 0.613; AUPRC: 0.007). Furthermore, the deep learning–
based early warning system reduced the number of alarms by 82.2%, 13.5%, and 42.1% compared with the modified early warning
system, random forest, and logistic regression, respectively, at the same sensitivity.
Conclusions-—An algorithm based on deep learning had high sensitivity and a low false-alarm rate for detection of patients with
cardiac arrest in the multicenter study. (J Am Heart Assoc. 2018;7:e008678. DOI: 10.1161/JAHA.118.008678.)
Key Words: artificial intelligence • cardiac arrest • deep learning • machine learning • rapid response system • resuscitation
In-hospital cardiac arrest is a major burden to public health,
which affects patient safety.1–3
More than a half of cardiac
arrests result from respiratory failure or hypovolemic shock,
and 80% of patients with cardiac arrest show signs of
deterioration in the 8 hours before cardiac arrest.4–9
However,
209 000 in-hospital cardiac arrests occur in the United States
each year, and the survival discharge rate for patients with
cardiac arrest is <20% worldwide.10,11
Rapid response systems
(RRSs) have been introduced in many hospitals to detect
cardiac arrest using the track-and-trigger system (TTS).12,13
Two types of TTS are used in RRSs. For the single-parameter
TTS (SPTTS), cardiac arrest is predicted if any single vital sign
(eg, heart rate [HR], blood pressure) is out of the normal
range.14
The aggregated weighted TTS calculates a weighted
score for each vital sign and then finds patients with cardiac
arrest based on the sum of these scores.15
The modified early
warning score (MEWS) is one of the most widely used
approaches among all aggregated weighted TTSs (Table 1)16
;
however, traditional TTSs including MEWS have limitations, with
low sensitivity or high false-alarm rates.14,15,17
Sensitivity and
false-alarm rate interact: Increased sensitivity creates higher
false-alarm rates and vice versa.
Current RRSs suffer from low sensitivity or a high false-
alarm rate. An RRS was used for only 30% of patients before
unplanned intensive care unit admission and was not used for
22.8% of patients, even if they met the criteria.18,19
From the Departments of Emergency Medicine (J.-m.K.) and Cardiology (J.P.), Mediplex Sejong Hospital, Incheon, Korea; VUNO, Seoul, Korea (Youngnam L., Yeha L.,
S.L.).
*Dr Kwon and Mr Youngnam Lee contributed equally to this study.
Correspondence to: Joon-myoung Kwon, MD, Department of Emergency medicine, Mediplex Sejong Hospital, 20, Gyeyangmunhwa-ro, Gyeyang-gu, Incheon 21080,
Korea. E-mail: kwonjm@sejongh.co.kr
Received January 18, 2018; accepted May 31, 2018.
ª 2018 The Authors. Published on behalf of the American Heart Association, Inc., by Wiley. This is an open access article under the terms of the Creative Commons
Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for
commercial purposes.
DOI: 10.1161/JAHA.118.008678 Journal of the American Heart Association 1
ORIGINAL RESEARCH
byguestonJune28,2018http://jaha.ahajournals.org/Downloadedfrom
•환자 수: 86,290
•cardiac arrest: 633
•Input: Heart rate, Respiratory rate, Body temperature, Systolic Blood Pressure
(source: VUNO)
Cardiac Arrest Prediction Accuracy
•대학병원 신속 대응팀에서 처리 가능한 알림 수 (A, B 지점) 에서 더 큰 정확도 차이를 보임
•A: DEWS 33.0%, MEWS 0.3%
•B: DEWS 42.7%, MEWS 4.0%
(source: VUNO)
APPH(Alarms Per Patients Per Hour)
(source: VUNO)
Less False Alarm
(source: VUNO)
시간에 따른 DEWS 예측 변화
Cardiogram
•실리콘밸리의 Cardiogram 은 애플워치로 측정한 심박수 데이터를 바탕으로 서비스
•2016년 10월 Andressen Horowitz 에서 $2m의 투자 유치
Passive Detection of Atrial Fibrillation
Using a Commercially Available Smartwatch
Geoffrey H. Tison, MD, MPH; José M. Sanchez, MD; Brandon Ballinger, BS; Avesh Singh, MS; Jeffrey E. Olgin, MD;
Mark J. Pletcher, MD, MPH; Eric Vittinghoff, PhD; Emily S. Lee, BA; Shannon M. Fan, BA; Rachel A. Gladstone, BA;
Carlos Mikell, BS; Nimit Sohoni, BS; Johnson Hsieh, MS; Gregory M. Marcus, MD, MAS
IMPORTANCE Atrial fibrillation (AF) affects 34 million people worldwide and is a leading cause
of stroke. A readily accessible means to continuously monitor for AF could prevent large
numbers of strokes and death.
OBJECTIVE To develop and validate a deep neural network to detect AF using smartwatch
data.
DESIGN, SETTING, AND PARTICIPANTS In this multinational cardiovascular remote cohort study
coordinated at the University of California, San Francisco, smartwatches were used to obtain
heart rate and step count data for algorithm development. A total of 9750 participants
enrolled in the Health eHeart Study and 51 patients undergoing cardioversion at the
University of California, San Francisco, were enrolled between February 2016 and March 2017.
A deep neural network was trained using a method called heuristic pretraining in which the
network approximated representations of the R-R interval (ie, time between heartbeats)
without manual labeling of training data. Validation was performed against the reference
standard 12-lead electrocardiography (ECG) in a separate cohort of patients undergoing
cardioversion. A second exploratory validation was performed using smartwatch data from
ambulatory individuals against the reference standard of self-reported history of persistent
AF. Data were analyzed from March 2017 to September 2017.
MAIN OUTCOMES AND MEASURES The sensitivity, specificity, and receiver operating
characteristic C statistic for the algorithm to detect AF were generated based on the
reference standard of 12-lead ECG–diagnosed AF.
RESULTS Of the 9750 participants enrolled in the remote cohort, including 347 participants
with AF, 6143 (63.0%) were male, and the mean (SD) age was 42 (12) years. There were more
than 139 million heart rate measurements on which the deep neural network was trained. The
deep neural network exhibited a C statistic of 0.97 (95% CI, 0.94-1.00; P < .001) to detect AF
against the reference standard 12-lead ECG–diagnosed AF in the external validation cohort of
51 patients undergoing cardioversion; sensitivity was 98.0% and specificity was 90.2%. In an
exploratory analysis relying on self-report of persistent AF in ambulatory participants, the C
statistic was 0.72 (95% CI, 0.64-0.78); sensitivity was 67.7% and specificity was 67.6%.
CONCLUSIONS AND RELEVANCE This proof-of-concept study found that smartwatch
photoplethysmography coupled with a deep neural network can passively detect AF but with
some loss of sensitivity and specificity against a criterion-standard ECG. Further studies will
help identify the optimal role for smartwatch-guided rhythm assessment.
JAMA Cardiol. doi:10.1001/jamacardio.2018.0136
Published online March 21, 2018.
Editorial
Supplemental content and
Audio
Author Affiliations: Division of
Cardiology, Department of Medicine,
University of California, San Francisco
(Tison, Sanchez, Olgin, Lee, Fan,
Gladstone, Mikell, Marcus);
Cardiogram Incorporated, San
Francisco, California (Ballinger, Singh,
Sohoni, Hsieh); Department of
Epidemiology and Biostatistics,
University of California, San Francisco
(Pletcher, Vittinghoff).
Corresponding Author: Gregory M.
Marcus, MD, MAS, Division of
Cardiology, Department of Medicine,
University of California, San
Francisco, 505 Parnassus Ave,
M1180B, San Francisco, CA 94143-
0124 (marcusg@medicine.ucsf.edu).
Research
JAMA Cardiology | Original Investigation
(Reprinted) E1
© 2018 American Medical Association. All rights reserved.
• ZIO Patch
• 2009년에 FDA에서 인허가 받은 의료기기
• 최대 2주까지 붙이고 다니면서 지속적으로 심전도를 측정
Cardiologist-Level Arrhythmia Detection
with Convolutional Neural Networks
Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks
Seq Set
Model Cardiol. Model Cardiol.
Class-level F1 Score
AFIB 0.604 0.515 0.667 0.544
AFL 0.687 0.635 0.679 0.646
AVB TYPE2 0.689 0.535 0.656 0.529
BIGEMINY 0.897 0.837 0.870 0.849
CHB 0.843 0.701 0.852 0.685
EAR 0.519 0.476 0.571 0.529
IVR 0.761 0.632 0.774 0.720
JUNCTIONAL 0.670 0.684 0.783 0.674
NOISE 0.823 0.768 0.704 0.689
SINUS 0.879 0.847 0.939 0.907
SVT 0.477 0.449 0.658 0.556
TRIGEMINY 0.908 0.843 0.870 0.816
VT 0.506 0.566 0.694 0.769
WENCKEBACH 0.709 0.593 0.806 0.736
Aggregate Results
Precision (PPV) 0.800 0.723 0.809 0.763
Recall (Sensitivity) 0.784 0.724 0.827 0.744
F1 0.776 0.719 0.809 0.751
Table 1. The top part of the table gives a class-level comparison of
the expert to the model F1 score for both the Sequence and the Set
In both the Sequence and the Set ca
F1 score for each class separately. W
overall F1 (and precision and recall) a
weighted mean.
Model vs. Cardiologist Performance
We assess the cardiologist performanc
call that each of the records in the t
truth label from a committee of three
as individual labels from a disjoint set
gists. To assess cardiologist performan
take the average of all the individual c
using the group label as the ground tru
Table 1 shows the breakdown of bo
model scores across the different rh
model outperforms the average cardi
on most rhythms, noticeably outperfo
gists in the AV Block set of arrhythm
Mobitz I (Wenckebach), Mobitz II (AV
plete heart block (CHB). This is esp
the severity of Mobitz II and complete
importance of distinguishing these tw
인공지능, 어떻게 활용할 것인가 (1)
인공지능+의사 시너지 증명
•손 엑스레이 영상을 판독하여 환자의 골연령 (뼈 나이)를 계산해주는 인공지능

• 기존에 의사는 그룰리히-파일(Greulich-Pyle)법 등으로 표준 사진과 엑스레이를 비교하여 판독

• 인공지능은 참조표준영상에서 성별/나이별 패턴을 찾아서 유사성을 확률로 표시 + 표준 영상 검색

•의사가 성조숙증이나 저성장을 진단하는데 도움을 줄 수 있음
AJR:209, December 2017 1
Since 1992, concerns regarding interob-
server variability in manual bone age esti-
mation [4] have led to the establishment of
several automatic computerized methods for
bone age estimation, including computer-as-
sisted skeletal age scores, computer-aided
skeletal maturation assessment systems, and
BoneXpert (Visiana) [5–14]. BoneXpert was
developed according to traditional machine-
learning techniques and has been shown to
have a good performance for patients of var-
ious ethnicities and in various clinical set-
tings [10–14]. The deep-learning technique
is an improvement in artificial neural net-
works. Unlike traditional machine-learning
techniques, deep-learning techniques allow
an algorithm to program itself by learning
from the images given a large dataset of la-
beled examples, thus removing the need to
specify rules [15].
Deep-learning techniques permit higher
levels of abstraction and improved predic-
tions from data. Deep-learning techniques
Computerized Bone Age
Estimation Using Deep Learning–
Based Program: Evaluation of the
Accuracy and Efficiency
Jeong Rye Kim1
Woo Hyun Shim1
Hee Mang Yoon1
Sang Hyup Hong1
Jin Seong Lee1
Young Ah Cho1
Sangki Kim2
Kim JR, Shim WH, Yoon MH, et al.
1
Department of Radiology and Research Institute of
Radiology, Asan Medical Center, University of Ulsan
College of Medicine, 88 Olympic-ro 43-gil, Songpa-gu,
Seoul 05505, South Korea. Address correspondence to
H. M. Yoon (espoirhm@gmail.com).
2
Vuno Research Center, Vuno Inc., Seoul, South Korea.
Pediatric Imaging • Original Research
Supplemental Data
Available online at www.ajronline.org.
AJR 2017; 209:1–7
0361–803X/17/2096–1
© American Roentgen Ray Society
B
one age estimation is crucial for
developmental status determina-
tions and ultimate height predic-
tions in the pediatric population,
particularly for patients with growth disor-
ders and endocrine abnormalities [1]. Two
major left-hand wrist radiograph-based
methods for bone age estimation are current-
ly used: the Greulich-Pyle [2] and Tanner-
Whitehouse [3] methods. The former is much
more frequently used in clinical practice.
Greulich-Pyle–based bone age estimation is
performed by comparing a patient’s left-hand
radiograph to standard radiographs in the
Greulich-Pyle atlas and is therefore simple
and easily applied in clinical practice. How-
ever, the process of bone age estimation,
which comprises a simple comparison of
multiple images, can be repetitive and time
consuming and is thus sometimes burden-
some to radiologists. Moreover, the accuracy
depends on the radiologist’s experience and
tends to be subjective.
Keywords: bone age, children, deep learning, neural
network model
DOI:10.2214/AJR.17.18224
J. R. Kim and W. H. Shim contributed equally to this work.
Received March 12, 2017; accepted after revision
July 7, 2017.
S. Kim is employed by Vuno, Inc., which created the deep
learning–based automatic software system for bone
age determination. J. R. Kim, W. H. Shim, H. M. Yoon,
S. H. Hong, J. S. Lee, and Y. A. Cho are employed by
Asan Medical Center, which holds patent rights for the
deep learning–based automatic software system for
bone age assessment.
OBJECTIVE. The purpose of this study is to evaluate the accuracy and efficiency of a
new automatic software system for bone age assessment and to validate its feasibility in clini-
cal practice.
MATERIALS AND METHODS. A Greulich-Pyle method–based deep-learning tech-
nique was used to develop the automatic software system for bone age determination. Using
this software, bone age was estimated from left-hand radiographs of 200 patients (3–17 years
old) using first-rank bone age (software only), computer-assisted bone age (two radiologists
with software assistance), and Greulich-Pyle atlas–assisted bone age (two radiologists with
Greulich-Pyle atlas assistance only). The reference bone age was determined by the consen-
sus of two experienced radiologists.
RESULTS. First-rank bone ages determined by the automatic software system showed a
69.5% concordance rate and significant correlations with the reference bone age (r = 0.992;
p < 0.001). Concordance rates increased with the use of the automatic software system for
both reviewer 1 (63.0% for Greulich-Pyle atlas–assisted bone age vs 72.5% for computer-as-
sisted bone age) and reviewer 2 (49.5% for Greulich-Pyle atlas–assisted bone age vs 57.5% for
computer-assisted bone age). Reading times were reduced by 18.0% and 40.0% for reviewers
1 and 2, respectively.
CONCLUSION. Automatic software system showed reliably accurate bone age estima-
tions and appeared to enhance efficiency by reducing reading times without compromising
the diagnostic accuracy.
Kim et al.
Accuracy and Efficiency of Computerized Bone Age Estimation
Pediatric Imaging
Original Research
Downloadedfromwww.ajronline.orgbyFloridaAtlanticUnivon09/13/17fromIPaddress131.91.169.193.CopyrightARRS.Forpersonaluseonly;allrightsreserved
• 총 환자의 수: 200명

• 레퍼런스: 경험 많은 소아영상의학과 전문의 2명(18년, 4년 경력)의 컨센서스

• 의사A: 소아영상 세부전공한 영상의학 전문의 (500례 이상의 판독 경험)

• 의사B: 영상의학과 2년차 전공의 (판독법 하루 교육 이수 + 20례 판독)

• 인공지능: VUNO의 골연령 판독 딥러닝
AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380.
40
50
60
70
80
인공지능 의사 A 의사 B
69.5%
63%
49.5%
정확도(%)
영상의학과 펠로우

(소아영상 세부전공)
영상의학과 

2년차 전공의
인공지능 vs 의사
AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380.
• 총 환자의 수: 200명

• 의사A: 소아영상 세부전공한 영상의학 전문의 (500례 이상의 판독 경험)

• 의사B: 영상의학과 2년차 전공의 (판독법 하루 교육 이수 + 20례 판독)

• 레퍼런스: 경험 많은 소아영상의학과 전문의 2명(18년, 4년 경력)의 컨센서스

• 인공지능: VUNO의 골연령 판독 딥러닝
골연령 판독에 인간 의사와 인공지능의 시너지 효과
Digital Healthcare Institute
Director,Yoon Sup Choi, PhD
yoonsup.choi@gmail.com
40
50
60
70
80
인공지능 의사 A 의사 B
40
50
60
70
80
의사 A 

+ 인공지능
의사 B 

+ 인공지능
69.5%
63%
49.5%
72.5%
57.5%
정확도(%)
영상의학과 펠로우

(소아영상 세부전공)
영상의학과 

2년차 전공의
인공지능 vs 의사 인공지능 + 의사
AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380.
• 총 환자의 수: 200명

• 의사A: 소아영상 세부전공한 영상의학 전문의 (500례 이상의 판독 경험)

• 의사B: 영상의학과 2년차 전공의 (판독법 하루 교육 이수 + 20례 판독)

• 레퍼런스: 경험 많은 소아영상의학과 전문의 2명(18년, 4년 경력)의 컨센서스

• 인공지능: VUNO의 골연령 판독 딥러닝
골연령 판독에 인간 의사와 인공지능의 시너지 효과
Digital Healthcare Institute
Director,Yoon Sup Choi, PhD
yoonsup.choi@gmail.com
총 판독 시간 (m)
0
50
100
150
200
w/o AI w/ AI
0
50
100
150
200
w/o AI w/ AI
188m
154m
180m
108m
saving 40%
of time
saving 18%
of time
의사 A 의사 B
골연령 판독에서 인공지능을 활용하면

판독 시간의 절감도 가능
• 총 환자의 수: 200명

• 의사A: 소아영상 세부전공한 영상의학 전문의 (500례 이상의 판독 경험)

• 의사B: 영상의학과 2년차 전공의 (판독법 하루 교육 이수 + 20례 판독)

• 레퍼런스: 경험 많은 소아영상의학과 전문의 2명(18년, 4년 경력)의 컨센서스

• 인공지능: VUNO의 골연령 판독 딥러닝
AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380.
Digital Healthcare Institute
Director,Yoon Sup Choi, PhD
yoonsup.choi@gmail.com
This copy is for personal use only.
To order printed copies, contact reprints@rsna.org
This copy is for personal use only.
To order printed copies, contact reprints@rsna.org
ORIGINAL RESEARCH • THORACIC IMAGING
hest radiography, one of the most common diagnos- intraobserver agreements because of its limited spatial reso-
Development and Validation of Deep
Learning–based Automatic Detection
Algorithm for Malignant Pulmonary Nodules
on Chest Radiographs
Ju Gang Nam, MD* • Sunggyun Park, PhD* • Eui Jin Hwang, MD • Jong Hyuk Lee, MD • Kwang-Nam Jin, MD,
PhD • KunYoung Lim, MD, PhD • Thienkai HuyVu, MD, PhD • Jae Ho Sohn, MD • Sangheum Hwang, PhD • Jin
Mo Goo, MD, PhD • Chang Min Park, MD, PhD
From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul
03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital,
Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of
Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco,
San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial & Information Systems Engineering, Seoul National University of Science and Technology, Seoul,
Republic of Korea (S.H.). Received January 30, 2018; revision requested March 20; revision received July 29; accepted August 6. Address correspondence to C.M.P.
(e-mail: cmpark.morphius@gmail.com).
Study supported by SNUH Research Fund and Lunit (06–2016–3000) and by Seoul Research and Business Development Program (FI170002).
*J.G.N. and S.P. contributed equally to this work.
Conflicts of interest are listed at the end of this article.
Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237 • Content codes:
Purpose: To develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules
on chest radiographs and to compare its performance with physicians including thoracic radiologists.
Materials and Methods: For this retrospective study, DLAD was developed by using 43292 chest radiographs (normal radiograph–
to–nodule radiograph ratio, 34067:9225) in 34676 patients (healthy-to-nodule ratio, 30784:3892; 19230 men [mean age, 52.8
years; age range, 18–99 years]; 15446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015,
which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph clas-
sification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three
South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection
performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife
alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance
test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation
data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared.
Results: According to one internal and four external validation data sets, radiograph classification and nodule detection perfor-
mances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher
AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P , .05), and
all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range,
0.006–0.190; P , .05).
Conclusion: This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nod-
ule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when
used as a second reader.
©RSNA, 2018
Online supplemental material is available for this article.
This copy is for personal use only.
To order printed copies, contact reprints@rsna.org
This copy is for personal use only.
To order printed copies, contact reprints@rsna.org
ORIGINAL RESEARCH • THORACIC IMAGING
hest radiography, one of the most common diagnos- intraobserver agreements because of its limited spatial reso-
Development and Validation of Deep
Learning–based Automatic Detection
Algorithm for Malignant Pulmonary Nodules
on Chest Radiographs
Ju Gang Nam, MD* • Sunggyun Park, PhD* • Eui Jin Hwang, MD • Jong Hyuk Lee, MD • Kwang-Nam Jin, MD,
PhD • KunYoung Lim, MD, PhD • Thienkai HuyVu, MD, PhD • Jae Ho Sohn, MD • Sangheum Hwang, PhD • Jin
Mo Goo, MD, PhD • Chang Min Park, MD, PhD
From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul
03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital,
Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of
Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco,
San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial & Information Systems Engineering, Seoul National University of Science and Technology, Seoul,
Republic of Korea (S.H.). Received January 30, 2018; revision requested March 20; revision received July 29; accepted August 6. Address correspondence to C.M.P.
(e-mail: cmpark.morphius@gmail.com).
Study supported by SNUH Research Fund and Lunit (06–2016–3000) and by Seoul Research and Business Development Program (FI170002).
*J.G.N. and S.P. contributed equally to this work.
Conflicts of interest are listed at the end of this article.
Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237 • Content codes:
Purpose: To develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules
on chest radiographs and to compare its performance with physicians including thoracic radiologists.
Materials and Methods: For this retrospective study, DLAD was developed by using 43292 chest radiographs (normal radiograph–
to–nodule radiograph ratio, 34067:9225) in 34676 patients (healthy-to-nodule ratio, 30784:3892; 19230 men [mean age, 52.8
years; age range, 18–99 years]; 15446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015,
which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph clas-
sification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three
South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection
performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife
alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance
test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation
data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared.
Results: According to one internal and four external validation data sets, radiograph classification and nodule detection perfor-
mances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher
AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P , .05), and
all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range,
0.006–0.190; P , .05).
Conclusion: This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nod-
ule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when
used as a second reader.
©RSNA, 2018
Online supplemental material is available for this article.
• 43,292 chest PA (normal:nodule=34,067:9225)
• labeled/annotated by 13 board-certified radiologists.
• DLAD were validated 1 internal + 4 external datasets
• 서울대병원 / 보라매병원 / 국립암센터 / UCSF
• Classification / Lesion localization
• 인공지능 vs. 의사 vs. 인공지능+의사
• 다양한 수준의 의사와 비교
• non-radiology / radiology residents
• board-certified radiologist / Thoracic radiologists
This copy is for personal use only.
To order printed copies, contact reprints@rsna.org
This copy is for personal use only.
To order printed copies, contact reprints@rsna.org
ORIGINAL RESEARCH • THORACIC IMAGING
hest radiography, one of the most common diagnos- intraobserver agreements because of its limited spatial reso-
Development and Validation of Deep
Learning–based Automatic Detection
Algorithm for Malignant Pulmonary Nodules
on Chest Radiographs
Ju Gang Nam, MD* • Sunggyun Park, PhD* • Eui Jin Hwang, MD • Jong Hyuk Lee, MD • Kwang-Nam Jin, MD,
PhD • KunYoung Lim, MD, PhD • Thienkai HuyVu, MD, PhD • Jae Ho Sohn, MD • Sangheum Hwang, PhD • Jin
Mo Goo, MD, PhD • Chang Min Park, MD, PhD
From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul
03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital,
Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of
Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco,
San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial & Information Systems Engineering, Seoul National University of Science and Technology, Seoul,
Republic of Korea (S.H.). Received January 30, 2018; revision requested March 20; revision received July 29; accepted August 6. Address correspondence to C.M.P.
(e-mail: cmpark.morphius@gmail.com).
Study supported by SNUH Research Fund and Lunit (06–2016–3000) and by Seoul Research and Business Development Program (FI170002).
*J.G.N. and S.P. contributed equally to this work.
Conflicts of interest are listed at the end of this article.
Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237 • Content codes:
Purpose: To develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules
on chest radiographs and to compare its performance with physicians including thoracic radiologists.
Materials and Methods: For this retrospective study, DLAD was developed by using 43292 chest radiographs (normal radiograph–
to–nodule radiograph ratio, 34067:9225) in 34676 patients (healthy-to-nodule ratio, 30784:3892; 19230 men [mean age, 52.8
years; age range, 18–99 years]; 15446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015,
which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph clas-
sification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three
South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection
performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife
alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance
test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation
data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared.
Results: According to one internal and four external validation data sets, radiograph classification and nodule detection perfor-
mances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher
AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P , .05), and
all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range,
0.006–0.190; P , .05).
Conclusion: This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nod-
ule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when
used as a second reader.
©RSNA, 2018
Online supplemental material is available for this article.
• 43,292 chest PA (normal:nodule=34,067:9225)
• labeled/annotated by 13 board-certified radiologists.
• DLAD were validated 1 internal + 4 external datasets
• 서울대병원 / 보라매병원 / 국립암센터 / UCSF
• Classification / Lesion localization
• 인공지능 vs. 의사 vs. 인공지능+의사
• 다양한 수준의 의사와 비교
• Non-radiology / radiology residents
• Board-certified radiologist / Thoracic radiologists
Nam et al
Figure 1: Images in a 78-year-old female patient with a 1.9-cm part-solid nodule at the left upper lobe. (a) The nodule was faintly visible on the
chest radiograph (arrowheads) and was detected by 11 of 18 observers. (b) At contrast-enhanced CT examination, biopsy confirmed lung adeno-
carcinoma (arrow). (c) DLAD reported the nodule with a confidence level of 2, resulting in its detection by an additional five radiologists and an
elevation in its confidence by eight radiologists.
Figure 2: Images in a 64-year-old male patient with a 2.2-cm lung adenocarcinoma at the left upper lobe. (a) The nodule was faintly visible on
the chest radiograph (arrowheads) and was detected by seven of 18 observers. (b) Biopsy confirmed lung adenocarcinoma in the left upper lobe
on contrast-enhanced CT image (arrow). (c) DLAD reported the nodule with a confidence level of 2, resulting in its detection by an additional two
radiologists and an elevated confidence level of the nodule by two radiologists.
Deep Learning Automatic Detection Algorithm for Malignant Pulmonary Nodules
Table 3: Patient Classification and Nodule Detection at the Observer Performance Test
Observer
Test 1
DLAD versus Test 1
(P Value) Test 2
Test 1 versus Test 2 (P
Value)
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC FOM)
Radiograph
Classification
Nodule
Detection
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC
FOM)
Radiograph
Classification
Nodule
Detection
Nonradiology
physicians
Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001
Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001
Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001
Group 0.691 ,.001* 0.828 ,.001*
Radiology residents
Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03
Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001
Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54
Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02
Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001
Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03
Group 0.790 ,.001* 0.867 ,.001*
Board-certified
radiologists
Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002
Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04
Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01
Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24
Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23
Group 0.821 .02* 0.840 .01*
Thoracic radiologists
Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03
Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02
Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12
Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02
Group 0.833 .08* 0.854 ,.001*
Note.—Observer 4 had 1 year of experience; observers 5 and 6 had 2 years of experience; observers 7–9 had 3 years of experience; observers
10–12 had 7 years of experience; observers 13 and 14 had 8 years of experience; observer 15 had 26 years of experience; observer 16 had 13
years of experience; and observers 17 and 18 had 9 years of experience. Observers 1–3 were 4th-year residents from obstetrics and gynecolo-
의사
인공지능 vs. 의사만
(p value) 의사+인공지능
의사 vs. 의사+인공지능
(p value)
영상의학과 1년차 전공의
영상의학과 2년차 전공의
영상의학과 3년차 전공의
산부인과 4년차 전공의
정형외과 4년차 전공의
내과 4년차 전공의
영상의학과 전문의
7년 경력
8년 경력
영상의학과 전문의 (흉부)
26년 경력
13년 경력
9년 경력
영상의학과 전공의
비영상의학과 의사
Deep Learning Automatic Detection Algorithm for Malignant Pulmonary Nodules
Table 3: Patient Classification and Nodule Detection at the Observer Performance Test
Observer
Test 1
DLAD versus Test 1
(P Value) Test 2
Test 1 versus Test 2 (P
Value)
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC FOM)
Radiograph
Classification
Nodule
Detection
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC
FOM)
Radiograph
Classification
Nodule
Detection
Nonradiology
physicians
Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001
Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001
Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001
Group 0.691 ,.001* 0.828 ,.001*
Radiology residents
Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03
Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001
Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54
Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02
Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001
Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03
Group 0.790 ,.001* 0.867 ,.001*
Board-certified
radiologists
Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002
Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04
Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01
Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24
Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23
Group 0.821 .02* 0.840 .01*
Thoracic radiologists
Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03
Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02
Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12
Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02
Group 0.833 .08* 0.854 ,.001*
Note.—Observer 4 had 1 year of experience; observers 5 and 6 had 2 years of experience; observers 7–9 had 3 years of experience; observers
10–12 had 7 years of experience; observers 13 and 14 had 8 years of experience; observer 15 had 26 years of experience; observer 16 had 13
years of experience; and observers 17 and 18 had 9 years of experience. Observers 1–3 were 4th-year residents from obstetrics and gynecolo-
의사
인공지능 vs. 의사만
(p value) 의사+인공지능
의사 vs. 의사+인공지능
(p value)
영상의학과 1년차 전공의
영상의학과 2년차 전공의
영상의학과 3년차 전공의
산부인과 4년차 전공의
정형외과 4년차 전공의
내과 4년차 전공의
영상의학과 전문의
7년 경력
8년 경력
영상의학과 전문의 (흉부)
26년 경력
13년 경력
9년 경력
영상의학과 전공의
비영상의학과 의사
•인공지능을 second reader로 활용하면 정확도가 개선
•classification: 17 of 18 명이 개선 (15 of 18, P<0.05)
•nodule detection: 18 of 18 명이 개선 (14 of 18, P<0.05)
https://www.facebook.com/groups/TensorFlowKR/permalink/633902253617503/
구글 엔지니어들이 AACR 2018 에서

의료 인공지능 기조 연설
Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018
Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018
Impact of Deep Learning Assistance on the
Histopathologic Review of Lymph Nodes for Metastatic
Breast Cancer
David F. Steiner, MD, PhD,* Robert MacDonald, PhD,* Yun Liu, PhD,* Peter Truszkowski, MD,*
Jason D. Hipp, MD, PhD, FCAP,* Christopher Gammage, MS,* Florence Thng, MS,†
Lily Peng, MD, PhD,* and Martin C. Stumpe, PhD*
Abstract: Advances in the quality of whole-slide images have set the
stage for the clinical use of digital images in anatomic pathology.
Along with advances in computer image analysis, this raises the
possibility for computer-assisted diagnostics in pathology to improve
histopathologic interpretation and clinical care. To evaluate the
potential impact of digital assistance on interpretation of digitized
slides, we conducted a multireader multicase study utilizing our deep
learning algorithm for the detection of breast cancer metastasis in
lymph nodes. Six pathologists reviewed 70 digitized slides from lymph
node sections in 2 reader modes, unassisted and assisted, with a wash-
out period between sessions. In the assisted mode, the deep learning
algorithm was used to identify and outline regions with high like-
lihood of containing tumor. Algorithm-assisted pathologists demon-
strated higher accuracy than either the algorithm or the pathologist
alone. In particular, algorithm assistance significantly increased the
sensitivity of detection for micrometastases (91% vs. 83%, P=0.02).
In addition, average review time per image was significantly shorter
with assistance than without assistance for both micrometastases (61
vs. 116 s, P=0.002) and negative images (111 vs. 137 s, P=0.018).
Lastly, pathologists were asked to provide a numeric score regarding
the difficulty of each image classification. On the basis of this score,
pathologists considered the image review of micrometastases to be
significantly easier when interpreted with assistance (P=0.0005).
Utilizing a proof of concept assistant tool, this study demonstrates the
potential of a deep learning algorithm to improve pathologist accu-
racy and efficiency in a digital pathology workflow.
Key Words: artificial intelligence, machine learning, digital pathology,
breast cancer, computer aided detection
(Am J Surg Pathol 2018;00:000–000)
The regulatory approval and gradual implementation of
whole-slide scanners has enabled the digitization of glass
slides for remote consults and archival purposes.1 Digitiza-
tion alone, however, does not necessarily improve the con-
sistency or efficiency of a pathologist’s primary workflow. In
fact, image review on a digital medium can be slightly
slower than on glass, especially for pathologists with limited
digital pathology experience.2 However, digital pathology
and image analysis tools have already demonstrated po-
tential benefits, including the potential to reduce inter-reader
variability in the evaluation of breast cancer HER2 status.3,4
Digitization also opens the door for assistive tools based on
Artificial Intelligence (AI) to improve efficiency and con-
sistency, decrease fatigue, and increase accuracy.5
Among AI technologies, deep learning has demon-
strated strong performance in many automated image-rec-
ognition applications.6–8 Recently, several deep learning–
based algorithms have been developed for the detection of
breast cancer metastases in lymph nodes as well as for other
applications in pathology.9,10 Initial findings suggest that
some algorithms can even exceed a pathologist’s sensitivity
for detecting individual cancer foci in digital images. How-
ever, this sensitivity gain comes at the cost of increased false
positives, potentially limiting the utility of such algorithms for
automated clinical use.11 In addition, deep learning algo-
rithms are inherently limited to the task for which they have
been specifically trained. While we have begun to understand
the strengths of these algorithms (such as exhaustive search)
and their weaknesses (sensitivity to poor optical focus, tumor
mimics; manuscript under review), the potential clinical util-
ity of such algorithms has not been thoroughly examined.
While an accurate algorithm alone will not necessarily aid
pathologists or improve clinical interpretation, these benefits
may be achieved through thoughtful and appropriate in-
tegration of algorithm predictions into the clinical workflow.8
From the *Google AI Healthcare; and †Verily Life Sciences, Mountain
View, CA.
D.F.S., R.M., and Y.L. are co-first authors (equal contribution).
Work done as part of the Google Brain Healthcare Technology Fellowship
(D.F.S. and P.T.).
Conflicts of Interest and Source of Funding: D.F.S., R.M., Y.L., P.T.,
J.D.H., C.G., F.T., L.P., M.C.S. are employees of Alphabet and have
Alphabet stock.
Correspondence: David F. Steiner, MD, PhD, Google AI Healthcare,
1600 Amphitheatre Way, Mountain View, CA 94043
(e-mail: davesteiner@google.com).
Supplemental Digital Content is available for this article. Direct URL citations
appear in the printed text and are provided in the HTML and PDF
versions of this article on the journal’s website, www.ajsp.com.
Copyright © 2018 The Author(s). Published by Wolters Kluwer Health,
Inc. This is an open-access article distributed under the terms of the
Creative Commons Attribution-Non Commercial-No Derivatives
License 4.0 (CCBY-NC-ND), where it is permissible to download and
share the work provided it is properly cited. The work cannot be
changed in any way or used commercially without permission from
the journal.
ORIGINAL ARTICLE
Am J Surg Pathol  Volume 00, Number 00, ’’ 2018 www.ajsp.com | 1
• 구글이 개발한 병리 인공지능, LYNA(LYmph Node Assistant)
• 유방암의 림프절 전이에 대해서,
• 병리학 전문의 + 인공지능의 시너지를 증명하는 연구
• 정확성(민감도) / 판독 시간 / (micrometa의) 판독 난이도
modeled separately. For micrometastases, sensitivity was significantly higher with
Negative
(Specificity)
Micromet
(Sensitivity)
Macromet
(Sensitivity)
0.7
0.5
0.6
0.8
0.9
1.0
p=0.02
A B
Performance
Unassisted
Assisted
FIGURE 3. Improved metastasis detection with algorithm assistance. A, Data represents performance across all images by image
category and assistance modality. Error bars indicate SE. The performance metric corresponds to corresponds to specificity for
negative cases and sensitivity for micrometastases (micromet) and macrometastases (macromet). B, Operating point of individual
pathologists with and without assistance for micrometastases and negative cases, overlayed on the receiver operating characteristic
curve of the algorithm. AUC indicates area under the curve.
Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved. www.ajsp.com | 5
• 민감도
• 인공지능을 사용한 경우 Micromet의 경우에 유의미하게 상승
• Negative와 Macromet은 유의미하지 않음
• AUC
• 병리학 전문의 혼자 or 인공지능 혼자보다,
• 병리학 전문의+인공지능이 조금 더 높음
isolated diagnostic tasks. Underlying these exciting advances,
however, is the important notion that these algorithms do not
replace the breadth and contextual knowledge of human
pathologists and that even the best algorithms would need to
from 83% to 91% and resulted in higher overall diagnostic
accuracy than that of either unassisted pathologist inter-
pretation or the computer algorithm alone. Although deep
learning algorithms have been credited with comparable
Unassisted Assisted
TimeofReview(seconds)
Timeofreviewperimage(seconds)
Negative ITC Micromet Macromet
A B
p=0.002
p=0.02
Unassisted
Assisted
Micrometastases
FIGURE 5. Average review time per image decreases with assistance. A, Average review time per image across all pathologists
analyzed by category. Black circles are average times with assistance, gray triangles represent average times without assistance.
Error bars indicate 95% confidence interval. B, Micrometastasis time of review decreases for nearly all images with assistance.
Circles represent average review time for each individual micrometastasis image, averaged across the 6 pathologists by assistance
modality. The dashed lines connect the points corresponding to the same image with and without assistance. The 2 images that
were not reviewed faster on average with assistance are represented with red dot-dash lines. Vertical lines of the box represent
quartiles, and the diamond indicates the average of review time for micrometastases in that modality. Micromet indicates mi-
crometastasis; macromet, macrometastasis.
8 | www.ajsp.com Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved.
• 판독 시간 (per image)
• 인공지능의 보조를 받으면, Negative와 Micromet 은 유의미하게 감소
• 특히, Micromet은 약 2분에서 1분으로 절반 가량 감소
• ITC(Isolated Tumor Cell)와 Negative는 유의미하지 않음
• 주관적인 판독 난이도
• Micromet 에 대해서만 유의미하게 쉬워짐
• 다른 경우는 통계적으로 유의미하지 않았음
unassisted
increase in
ut any sig-
e 3).
eep learning
ologists for
integrate into existing clinical workflows in order to improve
patient care. In this proof-of-concept study, we investigated the
impact of a computer assistance tool for the interpretation of
digitized HE slides, and show that a digital tool developed to
assist with the identification of lymph node metastases can
indeed augment the efficiency and accuracy of pathologists.
In regards to accuracy, algorithm assistance im-
proved the sensitivity of detection of micrometastases
ry and
I) (s)
d P
21) 0.018
45) 0.21
) 0.002
) 0.46
TABLE 3. Average Obviousness Scores to Assess the Difficulty
of Each Case by Image Category and Assistance Modality
Average Obviousness Score (95% CI)
Category (n images) Unassisted Assisted P
Negative (24) 67.5 (63.6-71.3) 72.0 (68.7-75.3) 0.29
Isolated tumor cells (8) 55.6 (47.7-63.5) 50.4 (42.2-58.6) 0.47
Micrometastasis (19) 63.1 (58.3-67.9) 83.6 (80.3-86.9) 0.0005
Macrometastasis (19) 90.1 (86.4-93.7) 93.1 (90.0-96.1) 0.16
Bold values indicates statistically significant differences.
Am J Surg Pathol  Volume 00, Number 00, ’’ 2018
인공지능, 어떻게 활용할 것인가 (2)
워크 플로우에 녹아들기
https://www.facebook.com/groups/TensorFlowKR/permalink/633902253617503/
구글 엔지니어들이 AACR 2018 에서

의료 인공지능 기조 연설
Access to Pathology AI algorithms is limited
Adoption barriers for digital pathology
• Expensive scanners
• IT infrastructure required
• Disrupt existing workflows
• Not all clinical needs addressed (speed, focus, etc)
 
 
Figures 
 
 
 
Figure 1: System overview.  
1: Schematic sketch of the whole device. 
2: A photo of the actual implementation. 
An Augmented Reality Microscope for
Realtime Automated Detection of Cancer
https://research.googleblog.com/2018/04/an-augmented-reality-microscope.html
An Augmented Reality Microscope for Cancer Detection
https://www.youtube.com/watch?v=9Mz84cwVmS0
 
 
 
 
 
 
Figure 3: Sample views through the lens. 
Top: Lymph node metastasis detection at 4X, 10X, 20X, and 40X. 
Bottom: Prostate cancer detection at 4X, 10X, and 20X. 
An Augmented Reality Microscope for
Realtime Automated Detection of Cancer
https://research.googleblog.com/2018/04/an-augmented-reality-microscope.html
An Augmented Reality Microscope for
Realtime Automated Detection of Cancer
 
 
PR quantification Mitosis Counting on HE slide Measurement of tumor size
Identification of H. pylori Identification of Mycobacterium
Identification of prostate cancer
region with estimation of
percentage tumor involvement
Ki67 quantification P53 quantification CD8 quantification
https://research.googleblog.com/2018/04/an-augmented-reality-microscope.html
인공지능, 어떻게 활용할 것인가 (3)
새로운 의학 연구의 출발점
•“향후 10년 동안 첫번째 cardiovascular event 가 올 것인가” 예측
•전향적 코호트 스터디: 영국 환자 378,256 명
•일상적 의료 데이터를 바탕으로 기계학습으로 질병을 예측하는 첫번째 대규모 스터디
•기존의 ACC/AHA 가이드라인과 4가지 기계학습 알고리즘의 정확도를 비교
•Random forest; Logistic regression; Gradient bossting; Neural network
Can machine-learning improve cardiovascular
risk prediction using routine clinical data?
Stephen F.Weng et al PLoS One 2017
in a sensitivity of 62.7% and PPV of 17.1%. The random forest algorithm resulted in a net
increase of 191 CVD cases from the baseline model, increasing the sensitivity to 65.3% and
PPV to 17.8% while logistic regression resulted in a net increase of 324 CVD cases (sensitivity
67.1%; PPV 18.3%). Gradient boosting machines and neural networks performed best, result-
ing in a net increase of 354 (sensitivity 67.5%; PPV 18.4%) and 355 CVD (sensitivity 67.5%;
PPV 18.4%) cases correctly predicted, respectively.
The ACC/AHA baseline model correctly predicted 53,106 non-cases from 75,585 total non-
cases, resulting in a specificity of 70.3% and NPV of 95.1%. The net increase in non-cases
Table 3. Top 10 risk factor variables for CVD algorithms listed in descending order of coefficient effect size (ACC/AHA; logistic regression),
weighting (neural networks), or selection frequency (random forest, gradient boosting machines). Algorithms were derived from training cohort of
295,267 patients.
ACC/AHA Algorithm Machine-learning Algorithms
Men Women ML: Logistic
Regression
ML: Random Forest ML: Gradient Boosting
Machines
ML: Neural Networks
Age Age Ethnicity Age Age Atrial Fibrillation
Total Cholesterol HDL Cholesterol Age Gender Gender Ethnicity
HDL Cholesterol Total Cholesterol SES: Townsend
Deprivation Index
Ethnicity Ethnicity Oral Corticosteroid
Prescribed
Smoking Smoking Gender Smoking Smoking Age
Age x Total Cholesterol Age x HDL Cholesterol Smoking HDL cholesterol HDL cholesterol Severe Mental Illness
Treated Systolic Blood
Pressure
Age x Total Cholesterol Atrial Fibrillation HbA1c Triglycerides SES: Townsend
Deprivation Index
Age x Smoking Treated Systolic Blood
Pressure
Chronic Kidney Disease Triglycerides Total Cholesterol Chronic Kidney Disease
Age x HDL Cholesterol Untreated Systolic
Blood Pressure
Rheumatoid Arthritis SES: Townsend
Deprivation Index
HbA1c BMI missing
Untreated Systolic
Blood Pressure
Age x Smoking Family history of
premature CHD
BMI Systolic Blood Pressure Smoking
Diabetes Diabetes COPD Total Cholesterol SES: Townsend
Deprivation Index
Gender
Italics: Protective Factors
https://doi.org/10.1371/journal.pone.0174944.t003
PLOS ONE | https://doi.org/10.1371/journal.pone.0174944 April 4, 2017 8 / 14
•기존 ACC/AHA 가이드라인의 위험 요소의 일부분만 기계학습 알고리즘에도 포함
•하지만, Diabetes는 네 모델 모두에서 포함되지 않았다.
•기존의 위험 예측 툴에는 포함되지 않던, 아래와 같은 새로운 요소들이 포함되었다.
•COPD, severe mental illness, prescribing of oral corticosteroids
•triglyceride level 등의 바이오 마커
Can machine-learning improve cardiovascular
risk prediction using routine clinical data?
Stephen F.Weng et al PLoS One 2017
correctly predicted compared to the baseline ACC/AHA model ranged from 191 non-cases for
the random forest algorithm to 355 non-cases for the neural networks. Full details on classifi-
cation analysis can be found in S2 Table.
Discussion
Compared to an established AHA/ACC risk prediction algorithm, we found all machine-
learning algorithms tested were better at identifying individuals who will develop CVD and
those that will not. Unlike established approaches to risk prediction, the machine-learning
methods used were not limited to a small set of risk factors, and incorporated more pre-exist-
Table 4. Performance of the machine-learning (ML) algorithms predicting 10-year cardiovascular disease (CVD) risk derived from applying train-
ing algorithms on the validation cohort of 82,989 patients. Higher c-statistics results in better algorithm discrimination. The baseline (BL) ACC/AHA
10-year risk prediction algorithm is provided for comparative purposes.
Algorithms AUC c-statistic Standard Error* 95% Confidence
Interval
Absolute Change from Baseline
LCL UCL
BL: ACC/AHA 0.728 0.002 0.723 0.735 —
ML: Random Forest 0.745 0.003 0.739 0.750 +1.7%
ML: Logistic Regression 0.760 0.003 0.755 0.766 +3.2%
ML: Gradient Boosting Machines 0.761 0.002 0.755 0.766 +3.3%
ML: Neural Networks 0.764 0.002 0.759 0.769 +3.6%
*Standard error estimated by jack-knife procedure [30]
https://doi.org/10.1371/journal.pone.0174944.t004
Can machine-learning improve cardiovascular risk prediction using routine clinical data?
•네 가지 기계학습 모델 모두 기존의 ACC/AHA 가이드라인 대비 더 정확했다.
•Neural Networks 이 AUC=0.764 로 가장 정확했다.
•“이 모델을 활용했더라면 355 명의 추가적인 cardiovascular event 를 예방했을 것”
•Deep Learning 을 활용하면 정확도는 더 높아질 수 있을 것
•Genetic information 등의 추가적인 risk factor 를 활용해볼 수 있다.
Constructing higher-level
contextual/relational features:
Relationships between epithelial
nuclear neighbors
Relationships between morphologically
regular and irregular nuclei
Relationships between epithelial
and stromal objects
Relationships between epithelial
nuclei and cytoplasm
Characteristics of
stromal nuclei
and stromal matrix
Characteristics of
epithelial nuclei and
epithelial cytoplasm
Building an epithelial/stromal classifier:
Epithelial vs.stroma
classifier
Epithelial vs.stroma
classifier
B
Basic image processing and feature construction:
HE image Image broken into superpixels Nuclei identified within
each superpixel
A
Relationships of contiguous epithelial
regions with underlying nuclear objects
Learning an image-based model to predict survival
Processed images from patients Processed images from patients
C
D
onNovember17,2011stm.sciencemag.orgwnloadedfrom
TMAs contain 0.6-mm-diameter cores (median
of two cores per case) that represent only a small
sample of the full tumor. We acquired data from
two separate and independent cohorts: Nether-
lands Cancer Institute (NKI; 248 patients) and
Vancouver General Hospital (VGH; 328 patients).
Unlike previous work in cancer morphom-
etry (18–21), our image analysis pipeline was
not limited to a predefined set of morphometric
features selected by pathologists. Rather, C-Path
measures an extensive, quantitative feature set
from the breast cancer epithelium and the stro-
ma (Fig. 1). Our image processing system first
performed an automated, hierarchical scene seg-
mentation that generated thousands of measure-
ments, including both standard morphometric
descriptors of image objects and higher-level
contextual, relational, and global image features.
The pipeline consisted of three stages (Fig. 1, A
to C, and tables S8 and S9). First, we used a set of
processing steps to separate the tissue from the
background, partition the image into small regions
of coherent appearance known as superpixels,
find nuclei within the superpixels, and construct
Constructing higher-level
contextual/relational features:
Relationships between epithelial
nuclear neighbors
Relationships between morphologically
regular and irregular nuclei
Relationships between epithelial
and stromal objects
Relationships between epithelial
nuclei and cytoplasm
Characteristics of
stromal nuclei
and stromal matrix
Characteristics of
epithelial nuclei and
epithelial cytoplasm
Epithelial vs.stroma
classifier
Epithelial vs.stroma
classifier
Relationships of contiguous epithelial
regions with underlying nuclear objects
Learning an image-based model to predict survival
Processed images from patients
alive at 5 years
Processed images from patients
deceased at 5 years
L1-regularized
logisticregression
modelbuilding
5YS predictive model
Unlabeled images
Time
P(survival)
C
D
Identification of novel prognostically
important morphologic features
basic cellular morphologic properties (epithelial reg-
ular nuclei = red; epithelial atypical nuclei = pale blue;
epithelial cytoplasm = purple; stromal matrix = green;
stromal round nuclei = dark green; stromal spindled
nuclei = teal blue; unclassified regions = dark gray;
spindled nuclei in unclassified regions = yellow; round
nuclei in unclassified regions = gray; background =
white). (Left panel) After the classification of each
image object, a rich feature set is constructed. (D)
Learning an image-based model to predict survival.
Processed images from patients alive at 5 years after
surgery and from patients deceased at 5 years after
surgery were used to construct an image-based prog-
nostic model. After construction of the model, it was
applied to a test set of breast cancer images (not
used in model building) to classify patients as high
or low risk of death by 5 years.
www.ScienceTranslationalMedicine.org 9 November 2011 Vol 3 Issue 108 108ra113 2
onNovember17,2011stm.sciencemag.orgDownloadedfrom
Digital Pathologist
Sci Transl Med. 2011 Nov 9;3(108):108ra113
Digital Pathologist
Sci Transl Med. 2011 Nov 9;3(108):108ra113
Top stromal features associated with survival.
primarily characterizing epithelial nuclear characteristics, such as
size, color, and texture (21, 36). In contrast, after initial filtering of im-
ages to ensure high-quality TMA images and training of the C-Path
models using expert-derived image annotations (epithelium and
stroma labels to build the epithelial-stromal classifier and survival
time and survival status to build the prognostic model), our image
analysis system is automated with no manual steps, which greatly in-
creases its scalability. Additionally, in contrast to previous approaches,
our system measures thousands of morphologic descriptors of diverse
identification of prognostic features whose significance was not pre-
viously recognized.
Using our system, we built an image-based prognostic model on
the NKI data set and showed that in this patient cohort the model
was a strong predictor of survival and provided significant additional
prognostic information to clinical, molecular, and pathological prog-
nostic factors in a multivariate model. We also demonstrated that the
image-based prognostic model, built using the NKI data set, is a strong
prognostic factor on another, independent data set with very different
SD of the ratio of the pixel intensity SD to the mean intensity
for pixels within a ring of the center of epithelial nuclei
A
The sum of the number of unclassified objects
SD of the maximum blue pixel value for atypical epithelial nuclei
Maximum distance between atypical epithelial nuclei
B
C
D
Maximum value of the minimum green pixel intensity value in
epithelial contiguous regions
Minimum elliptic fit of epithelial contiguous regions
SD of distance between epithelial cytoplasmic and nuclear objects
Average border between epithelial cytoplasmic objects
E
F
G
H
Fig. 5. Top epithelial features. The eight panels in the figure (A to H) each
shows one of the top-ranking epithelial features from the bootstrap anal-
ysis. Left panels, improved prognosis; right panels, worse prognosis. (A) SD
of the (SD of intensity/mean intensity) for pixels within a ring of the center
of epithelial nuclei. Left, relatively consistent nuclear intensity pattern (low
score); right, great nuclear intensity diversity (high score). (B) Sum of the
number of unclassified objects. Red, epithelial regions; green, stromal re-
gions; no overlaid color, unclassified region. Left, few unclassified objects
(low score); right, higher number of unclassified objects (high score). (C) SD
of the maximum blue pixel value for atypical epithelial nuclei. Left, high
score; right, low score. (D) Maximum distance between atypical epithe-
lial nuclei. Left, high score; right, low score. (Insets) Red, atypical epithelial
nuclei; black, typical epithelial nuclei. (E) Minimum elliptic fit of epithelial
contiguous regions. Left, high score; right, low score. (F) SD of distance
between epithelial cytoplasmic and nuclear objects. Left, high score; right,
low score. (G) Average border between epithelial cytoplasmic objects. Left,
high score; right, low score. (H) Maximum value of the minimum green
pixel intensity value in epithelial contiguous regions. Left, low score indi-
cating black pixels within epithelial region; right, higher score indicating
presence of epithelial regions lacking black pixels.
onNovember17,2011stm.sciencemag.orgDownloadedfrom
and stromal matrix throughout the image, with thin cords of epithe-
lial cells infiltrating through stroma across the image, so that each
stromal matrix region borders a relatively constant proportion of ep-
ithelial and stromal regions. The stromal feature with the second
largest coefficient (Fig. 4B) was the sum of the minimum green in-
tensity value of stromal-contiguous regions. This feature received a
value of zero when stromal regions contained dark pixels (such as
inflammatory nuclei). The feature received a positive value when
stromal objects were devoid of dark pixels. This feature provided in-
formation about the relationship between stromal cellular composi-
tion and prognosis and suggested that the presence of inflammatory
cells in the stroma is associated with poor prognosis, a finding con-
sistent with previous observations (32). The third most significant
stromal feature (Fig. 4C) was a measure of the relative border between
spindled stromal nuclei to round stromal nuclei, with an increased rel-
ative border of spindled stromal nuclei to round stromal nuclei asso-
ciated with worse overall survival. Although the biological underpinning
of this morphologic feature is currently not known, this analysis sug-
gested that spatial relationships between different populations of stro-
mal cell types are associated with breast cancer progression.
Reproducibility of C-Path 5YS model predictions on
samples with multiple TMA cores
For the C-Path 5YS model (which was trained on the full NKI data
set), we assessed the intrapatient agreement of model predictions when
predictions were made separately on each image contributed by pa-
tients in the VGH data set. For the 190 VGH patients who contributed
two images with complete image data, the binary predictions (high
or low risk) on the individual images agreed with each other for 69%
(131 of 190) of the cases and agreed with the prediction on the aver-
aged data for 84% (319 of 380) of the images. Using the continuous
prediction score (which ranged from 0 to 100), the median of the ab-
solute difference in prediction score among the patients with replicate
images was 5%, and the Spearman correlation among replicates was
0.27 (P = 0.0002) (fig. S3). This degree of intrapatient agreement is
only moderate, and these findings suggest significant intrapatient tumor
heterogeneity, which is a cardinal feature of breast carcinomas (33–35).
Qualitative visual inspection of images receiving discordant scores
suggested that intrapatient variability in both the epithelial and the
stromal components is likely to contribute to discordant scores for
the individual images. These differences appeared to relate both to
the proportions of the epithelium and stroma and to the appearance
of the epithelium and stroma. Last, we sought to analyze whether sur-
vival predictions were more accurate on the VGH cases that contributed
multiple cores compared to the cases that contributed only a single
core. This analysis showed that the C-Path 5YS model showed signif-
icantly improved prognostic prediction accuracy on the VGH cases
for which we had multiple images compared to the cases that con-
tributed only a single image (Fig. 7). Together, these findings show
a significant degree of intrapatient variability and indicate that increased
tumor sampling is associated with improved model performance.
DISCUSSION
Heat map of stromal matrix
objects mean abs.diff
to neighbors
HE image separated
into epithelial and
stromal objects
A
B
C
Worse
prognosis
Improved
prognosis
Improved
prognosis
Improved
prognosis
Worse
prognosis
Worse
prognosis
Fig. 4. Top stromal features associated with survival. (A) Variability in ab-
solute difference in intensity between stromal matrix regions and neigh-
bors. Top panel, high score (24.1); bottom panel, low score (10.5). (Insets)
Top panel, high score; bottom panel; low score. Right panels, stromal matrix
objects colored blue (low), green (medium), or white (high) according to
each object’s absolute difference in intensity to neighbors. (B) Presence
R E S E A R C H A R T I C L E
onNovember17,2011stm.sciencemag.orgDownloadedfrom
Top epithelial features.The eight panels in the figure (A to H) each
shows one of the top-ranking epithelial features from the bootstrap
anal- ysis. Left panels, improved prognosis; right panels, worse prognosis.
유방암 예후 예측 위한 새로운 기준 발견
P R E C I S I O N M E D I C I N E
Identification of type 2 diabetes subgroups through
topological analysis of patient similarity
Li Li,1
Wei-Yi Cheng,1
Benjamin S. Glicksberg,1
Omri Gottesman,2
Ronald Tamler,3
Rong Chen,1
Erwin P. Bottinger,2
Joel T. Dudley1,4
*
Type 2 diabetes (T2D) is a heterogeneous complex disease affecting more than 29 million Americans alone with a
rising prevalence trending toward steady increases in the coming decades. Thus, there is a pressing clinical need to
improve early prevention and clinical management of T2D and its complications. Clinicians have understood that
patients who carry the T2D diagnosis have a variety of phenotypes and susceptibilities to diabetes-related compli-
cations. We used a precision medicine approach to characterize the complexity of T2D patient populations based
on high-dimensional electronic medical records (EMRs) and genotype data from 11,210 individuals. We successfully
identified three distinct subgroups of T2D from topology-based patient-patient networks. Subtype 1 was character-
ized by T2D complications diabetic nephropathy and diabetic retinopathy; subtype 2 was enriched for cancer ma-
lignancy and cardiovascular diseases; and subtype 3 was associated most strongly with cardiovascular diseases,
neurological diseases, allergies, and HIV infections. We performed a genetic association analysis of the emergent
T2D subtypes to identify subtype-specific genetic markers and identified 1279, 1227, and 1338 single-nucleotide
polymorphisms (SNPs) that mapped to 425, 322, and 437 unique genes specific to subtypes 1, 2, and 3, respec-
tively. By assessing the human disease–SNP association for each subtype, the enriched phenotypes and
biological functions at the gene level for each subtype matched with the disease comorbidities and clinical dif-
ferences that we identified through EMRs. Our approach demonstrates the utility of applying the precision
medicine paradigm in T2D and the promise of extending the approach to the study of other complex, multi-
factorial diseases.
INTRODUCTION
Type 2 diabetes (T2D) is a complex, multifactorial disease that has
emerged as an increasing prevalent worldwide health concern asso-
ciated with high economic and physiological burdens. An estimated
29.1 million Americans (9.3% of the population) were estimated to
have some form of diabetes in 2012—up 13% from 2010—with T2D
representing up to 95% of all diagnosed cases (1, 2). Risk factors for
T2D include obesity, family history of diabetes, physical inactivity, eth-
nicity, and advanced age (1, 2). Diabetes and its complications now
rank among the leading causes of death in the United States (2). In fact,
diabetes is the leading cause of nontraumatic foot amputation, adult
blindness, and need for kidney dialysis, and multiplies risk for myo-
cardial infarction, peripheral artery disease, and cerebrovascular disease
(3–6). The total estimated direct medical cost attributable to diabetes in
the United States in 2012 was $176 billion, with an estimated $76 billion
attributable to hospital inpatient care alone. There is a great need to im-
prove understanding of T2D and its complex factors to facilitate pre-
vention, early detection, and improvements in clinical management.
A more precise characterization of T2D patient populations can en-
hance our understanding of T2D pathophysiology (7, 8). Current
clinical definitions classify diabetes into three major subtypes: type 1 dia-
betes (T1D), T2D, and maturity-onset diabetes of the young. Other sub-
types based on phenotype bridge the gap between T1D and T2D, for
example, latent autoimmune diabetes in adults (LADA) (7) and ketosis-
prone T2D. The current categories indicate that the traditional definition of
diabetes, especially T2D, might comprise additional subtypes with dis-
tinct clinical characteristics. A recent analysis of the longitudinal Whitehall
II cohort study demonstrated improved assessment of cardiovascular
risks when subgrouping T2D patients according to glucose concentration
criteria (9). Genetic association studies reveal that the genetic architec-
ture of T2D is profoundly complex (10–12). Identified T2D-associated
risk variants exhibit allelic heterogeneity and directional differentiation
among populations (13, 14). The apparent clinical and genetic com-
plexity and heterogeneity of T2D patient populations suggest that there
are opportunities to refine the current, predominantly symptom-based,
definition of T2D into additional subtypes (7).
Because etiological and pathophysiological differences exist among
T2D patients, we hypothesize that a data-driven analysis of a clinical
population could identify new T2D subtypes and factors. Here, we de-
velop a data-driven, topology-based approach to (i) map the complexity
of patient populations using clinical data from electronic medical re-
cords (EMRs) and (ii) identify new, emergent T2D patient subgroups
with subtype-specific clinical and genetic characteristics. We apply this
approachtoadatasetcomprisingmatchedEMRsandgenotypedatafrom
more than 11,000 individuals. Topological analysis of these data revealed
three distinct T2D subtypes that exhibited distinct patterns of clinical
characteristics and disease comorbidities. Further, we identified genetic
markers associated with each T2D subtype and performed gene- and
pathway-level analysis of subtype genetic associations. Biological and
phenotypic features enriched in the genetic analysis corroborated clinical
disparities observed among subgroups. Our findings suggest that data-
driven,topologicalanalysisofpatientcohortshasutilityinprecisionmedicine
effortstorefineourunderstandingofT2Dtowardimproving patient care.
1
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount
Sinai, 700 Lexington Ave., New York, NY 10065, USA. 2
Institute for Personalized Medicine,
Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029,
USA. 3
Division of Endocrinology, Diabetes, and Bone Diseases, Icahn School of Medicine
at Mount Sinai, New York, NY 10029, USA. 4
Department of Health Policy and Research,
Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
*Corresponding author. E-mail: joel.dudley@mssm.edu
R E S E A R C H A R T I C L E
www.ScienceTranslationalMedicine.org 28 October 2015 Vol 7 Issue 311 311ra174 1
onOctober28,2015http://stm.sciencemag.org/Downloadedfrom
and vision defects (RR, 1.32; range, 1.04
to 1.67), than were the other two subtypes
(Table 2A). Patients in subtype 2 (n = 617)
were more likely to associate with diseases
of cancer of bronchus: lung (RR, 3.76; range,
1.14 to 12.39); malignant neoplasm with-
out specification of site (RR, 3.46; range,
1.23 to 9.70); tuberculosis (RR, 2.93; range,
1.30 to 6.64); coronary atherosclerosis and
other heart disease (RR, 1.28; range, 1.01 to
1.61); and other circulatory disease (RR, 1.27;
range, 1.02 to 1.58), than were the other two
subtypes (Table 2B). Patients in subtype 3
(n = 1096) were more often diagnosed with
HIV infection (RR, 1.92; range, 1.30 to 2.85)
and were associated with E codes (that is,
external causes of injury care) (RR, 1.84;
range, 1.41 to 2.39); aortic and peripheral
arterial embolism or thrombosis (RR, 1.79;
range,1.18to 2.71); hypertension withcom-
plications and secondary hypertension (RR,
1.66; range, 1.29 to 2.15); coronary athero-
sclerosis and other heart disease (RR, 1.41;
range, 1.15 to 1.72); allergic reactions (RR,
1.42; range, 1.19 to 1.70); deficiency and other
anemia (RR, 1.39; range, 1.14 to 1.68); and
screening and history of mental health and
substance abuse code (RR, 1.30; range, 1.07
to 1.58) (Table 2C).
Significant disease–genetic variant
enrichments specific to T2D subtypes
We next evaluated the genetic variants sig-
nificantly associated with each of the three
subtypes. Observed genetic associations and
gene-level [that is, single-nucleotide poly-
morphisms (SNPs) mapped to gene-level
annotations] enrichments by hypergeometric
analysis are considered independent of the
Fig. 1. Patient and genotype networks. (A)
Patient-patient network for topology patterns
on 11,210 Biobank patients. Each node repre-
sents a single or a group of patients with the
significant similarity based on their clinical
features. Edge connected with nodes indicates
the nodes have shared patients. Red color rep-
resents the enrichment for patients with T2D
diagnosis, and blue color represents the non-
enrichment for patients with T2D diagnosis.
(B) Patient-patient network for topology pat-
terns on 2551 T2D patients. Each node repre-
sents a single or a group of patients with the
significant similarity based on their clinical
features. Edge connected with nodes indicates
the nodes have shared patients. Red color rep-
resents the enrichment for patients with females,
and blue color represents the enrichment for
males.
R E S E A R C H A R T I C L E
www.ScienceTranslationalMedicine.org 28 October 2015 Vol 7 Issue 311 311ra174 3
onOctober28,2015http://stm.sciencemag.org/Downloadedfrom
제2형 당뇨의 3가지 subgroup을 발견
(topological analysis 를 활용. 어쨌든 data-driven analysis.)
Machine Learning for Healthcare (MLFHC) 2018 at Stanford
Machine Learning for Healthcare (MLFHC) 2018 at Stanford
Machine Learning for Healthcare (MLFHC) 2018 at Stanford
Asthma를 5가지 subgroup으로 분류
Machine Learning for Healthcare (MLFHC) 2018 at Stanford
Subgroup에 따른 치료 효과가 다름
패러다임의 전환?
자율주행차의 구현단계
크루즈 컨트롤

차선 이탈 경보
막히는 길 뚫고 달림

(운전자 딴짓 가능)
목적지까지 알아서

(운전자 자도 됨)
100% 수동 완전한 자동화

(운전과 핸들X)
운전대 스스로 조작

(운전자의 감시)
자율주행차의 구현단계
크루즈 컨트롤

차선 이탈 경보
막히는 길 뚫고 달림

(운전자 딴짓 가능)
목적지까지 알아서

(운전자 자도 됨)
100% 수동 완전한 자동화

(운전과 핸들X)
운전대 스스로 조작

(운전자의 감시)
자율주행차의 구현단계
크루즈 컨트롤

차선 이탈 경보
막히는 길 뚫고 달림

(운전자 딴짓 가능)
목적지까지 알아서

(운전자 자도 됨)
100% 수동 완전한 자동화

(운전과 핸들X)
운전대 스스로 조작

(운전자의 감시)
자율주행차의 구현단계
크루즈 컨트롤

차선 이탈 경보
막히는 길 뚫고 달림

(운전자 딴짓 가능)
목적지까지 알아서

(운전자 자도 됨)
100% 수동 완전한 자동화

(운전과 핸들X)
운전대 스스로 조작

(운전자의 감시)
자율주행차의 구현단계
크루즈 컨트롤

차선 이탈 경보
막히는 길 뚫고 달림

(운전자 딴짓 가능)
목적지까지 알아서

(운전자 자도 됨)
100% 수동 완전한 자동화

(운전과 핸들X)
운전대 스스로 조작

(운전자의 감시)
자율주행차의 구현단계
크루즈 컨트롤

차선 이탈 경보
막히는 길 뚫고 달림

(운전자 딴짓 가능)
목적지까지 알아서

(운전자 자도 됨)
100% 수동 완전한 자동화

(운전과 핸들X)
운전대 스스로 조작

(운전자의 감시)
자율주행차의 구현단계
패러다임 전환

(사람이 운전하면 불법)
크루즈 컨트롤

차선 이탈 경보
막히는 길 뚫고 달림

(운전자 딴짓 가능)
목적지까지 알아서

(운전자 자도 됨)
100% 수동 완전한 자동화

(운전과 핸들X)
운전대 스스로 조작

(운전자의 감시)
크루즈 컨트롤

차선 이탈 경보
막히는 길 뚫고 달림

(운전자 딴짓 가능)
목적지까지 알아서

(운전자 자도 됨)
100% 수동 완전한 자동화

(운전과 핸들X)
운전대 스스로 조작

(운전자의 감시)
자율주행차의 구현단계
패러다임 전환

(사람이 운전하면 불법)
현재의 자율주행차
현재의 의료 인공지능
사람이 운전하는 것이 불법
인공지능이 운전하는 것이 합법
의사의 개입 없이 안저 사진을 판독하여 DR 여부를 진단
Deep Learning Automatic Detection Algorithm for Malignant Pulmonary Nodules
Table 3: Patient Classification and Nodule Detection at the Observer Performance Test
Observer
Test 1
DLAD versus Test 1
(P Value) Test 2
Test 1 versus Test 2 (P
Value)
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC FOM)
Radiograph
Classification
Nodule
Detection
Radiograph
Classification
(AUROC)
Nodule
Detection
(JAFROC
FOM)
Radiograph
Classification
Nodule
Detection
Nonradiology
physicians
Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001
Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001
Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001
Group 0.691 ,.001* 0.828 ,.001*
Radiology residents
Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03
Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001
Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54
Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02
Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001
Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03
Group 0.790 ,.001* 0.867 ,.001*
Board-certified
radiologists
Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002
Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04
Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01
Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24
Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23
Group 0.821 .02* 0.840 .01*
Thoracic radiologists
Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03
Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02
Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12
Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02
Group 0.833 .08* 0.854 ,.001*
Note.—Observer 4 had 1 year of experience; observers 5 and 6 had 2 years of experience; observers 7–9 had 3 years of experience; observers
10–12 had 7 years of experience; observers 13 and 14 had 8 years of experience; observer 15 had 26 years of experience; observer 16 had 13
years of experience; and observers 17 and 18 had 9 years of experience. Observers 1–3 were 4th-year residents from obstetrics and gynecolo-
의사
인공지능 vs. 의사만
(p value) 의사+인공지능
의사 vs. 의사+인공지능
(p value)
영상의학과 1년차 전공의
영상의학과 2년차 전공의
영상의학과 3년차 전공의
산부인과 4년차 전공의
정형외과 4년차 전공의
내과 4년차 전공의
영상의학과 전문의
7년 경력
8년 경력
영상의학과 전문의 (흉부)
26년 경력
13년 경력
9년 경력
영상의학과 전공의
비영상의학과 의사
인공지능 0.91 0.885
•“인공지능 혼자” 한 것이 “영상의학과 전문의+인공지능”보다 대부분 더 정확
•classification: 9명 중 6명보다 나음
•nodule detection: 9명 전원보다 나음

When digital medicine becomes the medicine (1/2)

  • 1.
    Professor, SAHIST, SungkyunkwanUniversity Director, Digital Healthcare Institute Yoon Sup Choi, Ph.D. 디지털 의료가 ‘의료’가 될 때 When Digital Medicine Becomes the Medicine
  • 2.
    Disclaimer 저는 위의 회사들과지분 관계, 자문 등으로 이해 관계가 있음을 밝힙니다. 스타트업 벤처캐피털
  • 3.
    “It's in Apple'sDNA that technology alone is not enough. 
 It's technology married with liberal arts.”
  • 4.
    The Convergence ofIT, BT and Medicine
  • 6.
    최윤섭 지음 의료인공지능 표지디자인•최승협 컴퓨터 털 헬 치를만드는 것을 화두로 기업가, 엔젤투자가, 에반 의 대표적인 전문가로, 활 이 분야를 처음 소개한 장 포항공과대학교에서 컴 동 대학원 시스템생명공 취득하였다. 스탠퍼드대 조교수, KT 종합기술원 컨 구원 연구조교수 등을 거 저널에 10여 편의 논문을 국내 최초로 디지털 헬스 윤섭 디지털 헬스케어 연 국내 유일의 헬스케어 스 어 파트너스’의 공동 창업 스타트업을 의료 전문가 관대학교 디지털헬스학과 뷰노, 직토, 3billion, 서지 소울링, 메디히어, 모바일 자문을 맡아 한국에서도 고 있다. 국내 최초의 디 케어 이노베이션』에 활발 을 연재하고 있다. 저서로 와 『그렇게 나는 스스로 •블로그_ http://www •페이스북_ https://w •이메일_ yoonsup.c 최윤섭 의료 인공지능은 보수적인 의료 시스템을 재편할 혁신을 일으키고 있다. 의료 인공지능의 빠른 발전과 광범위한 영향은 전문화, 세분화되며 발전해 온 현대 의료 전문가들이 이해하기가 어려우며, 어디서부 터 공부해야 할지도 막연하다. 이런 상황에서 의료 인공지능의 개념과 적용, 그리고 의사와의 관계를 쉽 게 풀어내는 이 책은 좋은 길라잡이가 될 것이다. 특히 미래의 주역이 될 의학도와 젊은 의료인에게 유용 한 소개서이다. ━ 서준범, 서울아산병원 영상의학과 교수, 의료영상인공지능사업단장 인공지능이 의료의 패러다임을 크게 바꿀 것이라는 것에 동의하지 않는 사람은 거의 없다. 하지만 인공 지능이 처리해야 할 의료의 난제는 많으며 그 해결 방안도 천차만별이다. 흔히 생각하는 만병통치약 같 은 의료 인공지능은 존재하지 않는다. 이 책은 다양한 의료 인공지능의 개발, 활용 및 가능성을 균형 있 게 분석하고 있다. 인공지능을 도입하려는 의료인, 생소한 의료 영역에 도전할 인공지능 연구자 모두에 게 일독을 권한다. ━ 정지훈, 경희사이버대 미디어커뮤니케이션학과 선임강의교수, 의사 서울의대 기초의학교육을 책임지고 있는 교수의 입장에서, 산업화 이후 변하지 않은 현재의 의학 교육 으로는 격변하는 인공지능 시대에 의대생을 대비시키지 못한다는 한계를 절실히 느낀다. 저와 함께 의 대 인공지능 교육을 개척하고 있는 최윤섭 소장의 전문적 분석과 미래 지향적 안목이 담긴 책이다. 인공 지능이라는 미래를 대비할 의대생과 교수, 그리고 의대 진학을 고민하는 학생과 학부모에게 추천한다. ━ 최형진, 서울대학교 의과대학 해부학교실 교수, 내과 전문의 최근 의료 인공지능의 도입에 대해서 극단적인 시각과 태도가 공존하고 있다. 이 책은 다양한 사례와 깊 은 통찰을 통해 의료 인공지능의 현황과 미래에 대해 균형적인 시각을 제공하여, 인공지능이 의료에 본 격적으로 도입되기 위한 토론의 장을 마련한다. 의료 인공지능이 일상화된 10년 후 돌아보았을 때, 이 책 이 그런 시대를 이끄는 길라잡이 역할을 하였음을 확인할 수 있기를 기대한다. ━ 정규환, 뷰노 CTO 의료 인공지능은 다른 분야 인공지능보다 더 본질적인 이해가 필요하다. 단순히 인간의 일을 대신하는 수준을 넘어 의학의 패러다임을 데이터 기반으로 변화시키기 때문이다. 따라서 인공지능을 균형있게 이 해하고, 어떻게 의사와 환자에게 도움을 줄 수 있을지 깊은 고민이 필요하다. 세계적으로 일어나고 있는 이러한 노력의 결과물을 집대성한 이 책이 반가운 이유다. ━ 백승욱, 루닛 대표 의료 인공지능의 최신 동향뿐만 아니라, 의의와 한계, 전망, 그리고 다양한 생각거리까지 주는 책이다. 논쟁이 되는 여러 이슈에 대해서도 저자는 자신의 시각을 명확한 근거에 기반하여 설득력 있게 제시하 고 있다. 개인적으로는 이 책을 대학원 수업 교재로 활용하려 한다. ━ 신수용, 성균관대학교 디지털헬스학과 교수 최윤섭지음 의료인공지능 값 20,000원 ISBN 979-11-86269-99-2 최초의 책! 계 안팎에서 제기 고 있다. 현재 의 분 커버했다고 자 것인가, 어느 진료 제하고 효용과 안 누가 지는가, 의학 쉬운 언어로 깊이 들이 의료 인공지 적인 용어를 최대 서 다른 곳에서 접 를 접하게 될 것 너무나 빨리 발전 책에서 제시하는 술을 공부하며, 앞 란다. 의사 면허를 취득 저가 도움되면 좋 를 불러일으킬 것 화를 일으킬 수도 슈에 제대로 대응 분은 의학 교육의 예비 의사들은 샌 지능과 함께하는 레이닝 방식도 이 전에 진료실과 수 겠지만, 여러분들 도생하는 수밖에 미래의료학자 최윤섭 박사가 제시하는 의료 인공지능의 현재와 미래 의료 딥러닝과 IBM 왓슨의 현주소 인공지능은 의사를 대체하는가 값 20,000원 ISBN 979-11-86269-99-2 레이닝 방식도 이 전에 진료실과 수 겠지만, 여러분들 도생하는 수밖에 소울링, 메디히어, 모바일 자문을 맡아 한국에서도 고 있다. 국내 최초의 디 케어 이노베이션』에 활발 을 연재하고 있다. 저서로 와 『그렇게 나는 스스로 •블로그_ http://www •페이스북_ https://w •이메일_ yoonsup.c
  • 8.
  • 9.
  • 10.
    Vinod Khosla Founder, 1stCEO of Sun Microsystems Partner of KPCB, CEO of KhoslaVentures LegendaryVenture Capitalist in SiliconValley
  • 11.
    “Technology will replace80% of doctors”
  • 12.
    https://www.youtube.com/watch?time_continue=70&v=2HMPRXstSvQ “영상의학과 전문의를 양성하는것을 당장 그만둬야 한다. 5년 안에 딥러닝이 영상의학과 전문의를 능가할 것은 자명하다.” Hinton on Radiology
  • 13.
  • 14.
    • "2018년 3Q는역대 최고로 투자 받기 좋은 시기였다” • 2018년 3Q에서 이미 2017년 투자 규모를 능가 • 모든 라운드에서 더 높은 빈도로, 더 큰 금액이 투자되는 entrepreneurs’ market
  • 15.
    헬스케어넓은 의미의 건강관리에는 해당되지만, 디지털 기술이 적용되지 않고, 전문 의료 영역도 아닌 것 예) 운동, 영양, 수면 디지털 헬스케어 건강 관리 중에 디지털 기술이 사용되는 것 예) 사물인터넷, 인공지능, 3D 프린터, VR/AR 모바일 헬스케어 디지털 헬스케어 중 모바일 기술이 사용되는 것 예) 스마트폰, 사물인터넷, SNS 개인 유전정보분석 예) 암유전체, 질병위험도, 보인자, 약물 민감도 예) 웰니스, 조상 분석 헬스케어 관련 분야 구성도 (ver 0.3) 의료 질병 예방, 치료, 처방, 관리 등 전문 의료 영역 원격의료 원격진료
  • 16.
    EDITORIAL OPEN Digital medicine,on its way to being just plain medicine npj Digital Medicine (2018)1:20175 ; doi:10.1038/ s41746-017-0005-1 There are already nearly 30,000 peer-reviewed English-language scientific journals, producing an estimated 2.5 million articles a year.1 So why another, and why one focused specifically on digital medicine? To answer that question, we need to begin by defining what “digital medicine” means: using digital tools to upgrade the practice of medicine to one that is high-definition and far more individualized. It encompasses our ability to digitize human beings using biosensors that track our complex physiologic systems, but also the means to process the vast data generated via algorithms, cloud computing, and artificial intelligence. It has the potential to democratize medicine, with smartphones as the hub, enabling each individual to generate their own real world data and being far more engaged with their health. Add to this new imaging tools, mobile device laboratory capabilities, end-to-end digital clinical trials, telemedicine, and one can see there is a remarkable array of transformative technology which lays the groundwork for a new form of healthcare. As is obvious by its definition, the far-reaching scope of digital medicine straddles many and widely varied expertise. Computer scientists, healthcare providers, engineers, behavioral scientists, ethicists, clinical researchers, and epidemiologists are just some of the backgrounds necessary to move the field forward. But to truly accelerate the development of digital medicine solutions in health requires the collaborative and thoughtful interaction between individuals from several, if not most of these specialties. That is the primary goal of npj Digital Medicine: to serve as a cross-cutting resource for everyone interested in this area, fostering collabora- tions and accelerating its advancement. Current systems of healthcare face multiple insurmountable challenges. Patients are not receiving the kind of care they want and need, caregivers are dissatisfied with their role, and in most countries, especially the United States, the cost of care is unsustainable. We are confident that the development of new systems of care that take full advantage of the many capabilities that digital innovations bring can address all of these major issues. Researchers too, can take advantage of these leading-edge technologies as they enable clinical research to break free of the confines of the academic medical center and be brought into the real world of participants’ lives. The continuous capture of multiple interconnected streams of data will allow for a much deeper refinement of our understanding and definition of most pheno- types, with the discovery of novel signals in these enormous data sets made possible only through the use of machine learning. Our enthusiasm for the future of digital medicine is tempered by the recognition that presently too much of the publicized work in this field is characterized by irrational exuberance and excessive hype. Many technologies have yet to be formally studied in a clinical setting, and for those that have, too many began and ended with an under-powered pilot program. In addition, there are more than a few examples of digital “snake oil” with substantial uptake prior to their eventual discrediting.2 Both of these practices are barriers to advancing the field of digital medicine. Our vision for npj Digital Medicine is to provide a reliable, evidence-based forum for all clinicians, researchers, and even patients, curious about how digital technologies can transform every aspect of health management and care. Being open source, as all medical research should be, allows for the broadest possible dissemination, which we will strongly encourage, including through advocating for the publication of preprints And finally, quite paradoxically, we hope that npj Digital Medicine is so successful that in the coming years there will no longer be a need for this journal, or any journal specifically focused on digital medicine. Because if we are able to meet our primary goal of accelerating the advancement of digital medicine, then soon, we will just be calling it medicine. And there are already several excellent journals for that. ACKNOWLEDGEMENTS Supported by the National Institutes of Health (NIH)/National Center for Advancing Translational Sciences grant UL1TR001114 and a grant from the Qualcomm Foundation. ADDITIONAL INFORMATION Competing interests:The authors declare no competing financial interests. Publisher's note:Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Change history:The original version of this Article had an incorrect Article number of 5 and an incorrect Publication year of 2017. These errors have now been corrected in the PDF and HTML versions of the Article. Steven R. Steinhubl1 and Eric J. Topol1 1 Scripps Translational Science Institute, 3344 North Torrey Pines Court, Suite 300, La Jolla, CA 92037, USA Correspondence: Steven R. Steinhubl (steinhub@scripps.edu) or Eric J. Topol (etopol@scripps.edu) REFERENCES 1. Ware, M. & Mabe, M. The STM report: an overview of scientific and scholarly journal publishing 2015 [updated March]. http://digitalcommons.unl.edu/scholcom/92017 (2015). 2. Plante, T. B., Urrea, B. & MacFarlane, Z. T. et al. Validation of the instant blood pressure smartphone App. JAMA Intern. Med. 176, 700–702 (2016). Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/. © The Author(s) 2018 Received: 19 October 2017 Accepted: 25 October 2017 www.nature.com/npjdigitalmed Published in partnership with the Scripps Translational Science Institute 디지털 의료의 미래는? 일상적인 의료가 되는 것
  • 17.
    디지털 의료가 ‘의료’가될 때 •데이터, 데이터, 데이터 •의료 인공지능 •원격 의료 •VR/AR 기반 수련/수술 •디지털 신약 •환자 주도의 의료
  • 18.
    What is mostimportant factor in digital medicine?
  • 19.
    “Data! Data! Data!”he cried.“I can’t make bricks without clay!” - Sherlock Holmes,“The Adventure of the Copper Beeches”
  • 20.
  • 22.
    새로운 데이터가 새로운 방식으로 새로운주체에 의해 측정, 저장, 통합, 분석된다. 데이터의 종류 데이터의 질 데이터의 양 웨어러블 기기 스마트폰 유전 정보 분석 인공지능 SNS 사용자/환자 대중
  • 23.
    디지털 헬스케어의 3단계 •Step1. 데이터의 측정 •Step 2. 데이터의 통합 •Step 3. 데이터의 분석
  • 24.
  • 25.
  • 26.
    검이경 더마토스코프 안과질환피부암 기생충 호흡기 심전도 수면 식단 활동량 발열 생리/임신
  • 27.
  • 28.
  • 31.
  • 32.
    Fig 1. Whatcan consumer wearables do? Heart rate can be measured with an oximeter built into a ring [3], muscle activity with an electromyographi sensor embedded into clothing [4], stress with an electodermal sensor incorporated into a wristband [5], and physical activity or sleep patterns via an accelerometer in a watch [6,7]. In addition, a female’s most fertile period can be identified with detailed body temperature tracking [8], while levels of me attention can be monitored with a small number of non-gelled electroencephalogram (EEG) electrodes [9]. Levels of social interaction (also known to a PLOS Medicine 2016
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
    2003 Human GenomeProject 13 years (676 weeks) $2,700,000,000 2007 Dr. CraigVenter’s genome 4 years (208 weeks) $100,000,000 2008 Dr. James Watson’s genome 4 months (16 weeks) $1,000,000 2009 (Nature Biotechnology) 4 weeks $48,000 2013 1-2 weeks ~$5,000
  • 38.
    The $1000 Genomeis Already Here!
  • 39.
    •2017년 1월 NovaSeq5000, 6000 발표 •몇년 내로 $100로 WES 를 실현하겠다고 공언 •2일에 60명의 WES 가능 (한 명당 한 시간 이하)
  • 41.
  • 42.
    Human genomes arebeing sequenced at an ever-increasing rate. The 1000 Genomes Project has aggregated hundreds of genomes; The Cancer Genome Atlas (TGCA) has gathered several thousand; and the Exome Aggregation Consortium (ExAC) has sequenced more than 60,000 exomes. Dotted lines show three possible future growth curves. DNA SEQUENCING SOARS 2001 2005 2010 2015 2020 2025 100 103 106 109 Human Genome Project Cumulativenumberofhumangenomes 1000 Genomes TCGA ExAC Current amount 1st personal genome Recorded growth Projection Double every 7 months (historical growth rate) Double every 12 months (Illumina estimate) Double every 18 months (Moore's law) Michael Einsetein, Nature, 2015
  • 43.
    more rapid andaccurate approaches to infectious diseases. The driver mutations and key biologic unde Sequencing Applications in Medicine from Prewomb to Tomb Cell. 2014 Mar 27; 157(1): 241–253.
  • 44.
    데이터 소스 (4)디지털 표현형
  • 45.
    Digital Phenotype: Your smartphoneknows if you are depressed Ginger.io
  • 46.
    Digital Phenotype: Your smartphoneknows if you are depressed J Med Internet Res. 2015 Jul 15;17(7):e175. The correlation analysis between the features and the PHQ-9 scores revealed that 6 of the 10 features were significantly correlated to the scores: • strong correlation: circadian movement, normalized entropy, location variance • correlation: phone usage features, usage duration and usage frequency
  • 47.
    the manifestations ofdisease by providing a more comprehensive and nuanced view of the experience of illness. Through the lens of the digital phenotype, an individual’s interaction The digital phenotype Sachin H Jain, Brian W Powers, Jared B Hawkins & John S Brownstein In the coming years, patient phenotypes captured to enhance health and wellness will extend to human interactions with digital technology. In 1982, the evolutionary biologist Richard Dawkins introduced the concept of the “extended phenotype”1, the idea that pheno- types should not be limited just to biological processes, such as protein biosynthesis or tissue growth, but extended to include all effects that a gene has on its environment inside or outside ofthebodyoftheindividualorganism.Dawkins stressed that many delineations of phenotypes are arbitrary. Animals and humans can modify their environments, and these modifications andassociatedbehaviorsareexpressionsofone’s genome and, thus, part of their extended phe- notype. In the animal kingdom, he cites damn buildingbybeaversasanexampleofthebeaver’s extended phenotype1. Aspersonaltechnologybecomesincreasingly embedded in human lives, we think there is an important extension of Dawkins’s theory—the notion of a ‘digital phenotype’. Can aspects of ourinterfacewithtechnologybesomehowdiag- nosticand/orprognosticforcertainconditions? Can one’s clinical data be linked and analyzed together with online activity and behavior data to create a unified, nuanced view of human dis- ease?Here,wedescribetheconceptofthedigital phenotype. Although several disparate studies have touched on this notion, the framework for medicine has yet to be described. We attempt to define digital phenotype and further describe the opportunities and challenges in incorporat- ing these data into healthcare. Jan. 2013 0.000 0.002 0.004 Density 0.006 July 2013 Jan. 2014 July 2014 User 1 User 2 User 3 User 4 User 5 User 6 User 7 Date Figure 1 Timeline of insomnia-related tweets from representative individuals. Density distributions (probability density functions) are shown for seven individual users over a two-year period. Density on the y axis highlights periods of relative activity for each user. A representative tweet from each user is shown as an example. npg©2015NatureAmerica,Inc.Allrightsreserved. http://www.nature.com/nbt/journal/v33/n5/full/nbt.3223.html
  • 48.
    ers, Jared BHawkins & John S Brownstein phenotypes captured to enhance health and wellness will extend to human interactions with st Richard pt of the hat pheno- biological sis or tissue effects that or outside m.Dawkins phenotypes can modify difications onsofone’s ended phe- cites damn hebeaver’s ncreasingly there is an heory—the aspects of ehowdiag- Jan. 2013 0.000 0.002 0.004 Density 0.006 July 2013 Jan. 2014 July 2014 User 1 User 2 User 3 User 4 User 5 User 6 User 7 Date Figure 1 Timeline of insomnia-related tweets from representative individuals. Density distributions (probability density functions) are shown for seven individual users over a two-year period. Density on the y axis highlights periods of relative activity for each user. A representative tweet from each user is Your twitter knows if you cannot sleep Timeline of insomnia-related tweets from representative individuals. Nat. Biotech. 2015
  • 49.
    Reece & Danforth,“Instagram photos reveal predictive markers of depression” (2016) higher Hue (bluer) lower Saturation (grayer) lower Brightness (darker)
  • 50.
    Digital Phenotype: Your Instagramknows if you are depressed Rao (MVR) (24) .     Results  Both All­data and Pre­diagnosis models were decisively superior to a null model . All­data predictors were significant with 99% probability.57.5;(KAll  = 1 K 49.8)  Pre = 1  7 Pre­diagnosis and All­data confidence levels were largely identical, with two exceptions:  Pre­diagnosis Brightness decreased to 90% confidence, and Pre­diagnosis posting frequency  dropped to 30% confidence, suggesting a null predictive value in the latter case.   Increased hue, along with decreased brightness and saturation, predicted depression. This  means that photos posted by depressed individuals tended to be bluer, darker, and grayer (see  Fig. 2). The more comments Instagram posts received, the more likely they were posted by  depressed participants, but the opposite was true for likes received. In the All­data model, higher  posting frequency was also associated with depression. Depressed participants were more likely  to post photos with faces, but had a lower average face count per photograph than healthy  participants. Finally, depressed participants were less likely to apply Instagram filters to their  posted photos.     Fig. 2. Magnitude and direction of regression coefficients in All­data (N=24,713) and Pre­diagnosis (N=18,513)  models. X­axis values represent the adjustment in odds of an observation belonging to depressed individuals, per  Reece & Danforth, “Instagram photos reveal predictive markers of depression” (2016)     Fig. 1. Comparison of HSV values. Right photograph has higher Hue (bluer), lower Saturation (grayer), and lower  Brightness (darker) than left photograph. Instagram photos posted by depressed individuals had HSV values  shifted towards those in the right photograph, compared with photos posted by healthy individuals.    Units of observation  In determining the best time span for this analysis, we encountered a difficult question:  When and for how long does depression occur? A diagnosis of depression does not indicate the  persistence of a depressive state for every moment of every day, and to conduct analysis using an  individual’s entire posting history as a single unit of observation is therefore rather specious. At  the other extreme, to take each individual photograph as units of observation runs the risk of  being too granular. DeChoudhury et al. (5) looked at all of a given user’s posts in a single day,  and aggregated those data into per­person, per­day units of observation. We adopted this  precedent of “user­days” as a unit of analysis .  5   Statistical framework  We used Bayesian logistic regression with uninformative priors to determine the strength  of individual predictors. Two separate models were trained. The All­data model used all  collected data to address Hypothesis 1. The Pre­diagnosis model used all data collected from  higher Hue (bluer) lower Saturation (grayer) lower Brightness (darker)
  • 51.
    Digital Phenotype: Your Instagramknows if you are depressed Reece & Danforth, “Instagram photos reveal predictive markers of depression” (2016) . In particular, depressedχ2 07.84, p .17e 64;( All  = 9   = 9 − 1 13.80, p .87e 44)χ2Pre  = 8   = 2 − 1   participants were less likely than healthy participants to use any filters at all. When depressed  participants did employ filters, they most disproportionately favored the “Inkwell” filter, which  converts color photographs to black­and­white images. Conversely, healthy participants most  disproportionately favored the Valencia filter, which lightens the tint of photos. Examples of  filtered photographs are provided in SI Appendix VIII.     Fig. 3. Instagram filter usage among depressed and healthy participants. Bars indicate difference between observed  and expected usage frequencies, based on a Chi­squared analysis of independence. Blue bars indicate  disproportionate use of a filter by depressed compared to healthy participants, orange bars indicate the reverse. 
  • 52.
    Digital Phenotype: Your Instagramknows if you are depressed Reece & Danforth, “Instagram photos reveal predictive markers of depression” (2016)   VIII. Instagram filter examples    Fig. S8. Examples of Inkwell and Valencia Instagram filters.  Inkwell converts  color photos to black­and­white, Valencia lightens tint.  Depressed participants  most favored Inkwell compared to healthy participants, Healthy participants 
  • 53.
    데이터 소스 (5)마이크로바이옴
  • 54.
    Leading Edge Review Individualized Medicine fromPrewomb to Tomb Eric J. Topol1 ,* 1The Scripps Translational Science Institute, The Scripps Research Institute and Scripps Health, La Jolla, CA 92037, USA *Correspondence: etopol@scripps.edu http://dx.doi.org/10.1016/j.cell.2014.02.012 That each of us is truly biologically unique, extending to even monozygotic, ‘‘identical’’ twins, is not fully appreciated. Now that it is possible to perform a comprehensive ‘‘omic’’ assessment of an individual, including one’s DNA and RNA sequence and at least some characterization of one’s proteome, metabolome, microbiome, autoantibodies, and epigenome, it has become abundantly clear that each of us has truly one-of-a-kind biological content. Well beyond the allure of the match- less fingerprint or snowflake concept, these singular, individual data and information set up a remarkable and unprecedented opportunity to improve medical treatment and develop preventive strategies to preserve health. From Digital to Biological to Individualized Medicine In 2010, Eric Schmidt of Google said ‘‘The power of individual targeting—the technology will be so good it will be very hard for people to watch or consume something that has not in some sense been tailored for them’’ (Jenkins, 2010). Although referring to the capability of digital technology, we have now reached a time of convergence of the digital and biologic do- mains. It has been well established that 0 and 1 are interchange- able with A, C, T, and G in books and Shakespeare sonnets and that DNA may represent the ultimate data storage system (Church et al., 2012; Goldman et al., 2013b). Biological transis- tors, also known as genetic logic gates, have now been devel- oped that make a computer from a living cell (Bonnet et al., 2013). The convergence of biology and technology was further captured by one of the protagonists of the digital era, Steve Jobs, who said ‘‘I think the biggest innovations of the 21st cen- tury will be at the intersection of biology and technology. A new era is beginning’’ (Issacson, 2011). With whole-genome DNA sequencing and a variety of omic technologies to define aspects of each individual’s biology at many different levels, we have indeed embarked on a new era of medicine. The term ‘‘personalized medicine’’ has been used for many years but has engendered considerable confusion. A recent survey indicated that only 4% of the public understand what the term is intended to mean (Stanton, 2013), and the hack- neyed, commercial use of ‘‘personalized’’ makes many people think that this refers to a concierge service of medical care. Whereas ‘‘person’’ refers to a human being, ‘‘personalized’’ can mean anything from having monogrammed stationary or luggage to ascribing personal qualities. Therefore, it was not surprising that a committee representing the National Academy of Sciences proposed using the term ‘‘precision medicine’’ as defined by ‘‘tailoring of medical treatment to the individual char- acteristics of each patient’’ (National Research Council, 2011). Although the term ‘‘precision’’ denotes the objective of exact- ness, ironically, it too can be viewed as ambiguous in this context because it does not capture the sense that the information is derived from the individual. For example, many laboratory tests could be made more precise by assay methodology, and treat- ments could be made more precise by avoiding side effects— without having anything to do with a specific individual. Other terms that have been suggested include genomic, digital, and stratified medicine, but all of these have a similar problem or appear to be too narrowly focused. The definition of individual is a single human being, derived from the Latin word individu, or indivisible. I propose individual- ized medicine as the preferred term because it has a useful double entendre. It relates not only to medicine that is particular- ized to a human being but also the future impact of digital technology on individuals driving their health care. There will increasingly be the flow of one’s biologic data and relevant medical information directly to the individual. Be it a genome sequence on a tablet or the results of a biosensor for blood pres- sure or another physiologic metric displayed on a smartphone, the digital convergence with biology will definitively anchor the individual as a source of salient data, the conduit of information flow, and a—if not the—principal driver of medicine in the future. The Human GIS Perhaps the most commonly used geographic information systems (GIS) are Google maps, which provide a layered approach to data visualization, such as viewing a location via satellite overlaid with street names, landmarks, and real-time traffic data. This GIS exemplifies the concept of gathering and transforming large bodies of data to provide exquisite temporal and location information. With the multiple virtual views, it gives one the sense of physically being on site. Although Google has digitized and thus created a GIS for the Earth, it is now possible to digitize a human being. As shown in Figure 1, there are multi- ple layers of data that can now be obtained for any individual. This includes data from biosensors, scanners, electronic medi- cal records, social media, and the various omics that include Cell 157, March 27, 2014 ª2014 Elsevier Inc. 241
  • 55.
    Leading Edge Review Individualized Medicine fromPrewomb to Tomb Eric J. Topol1 ,* 1The Scripps Translational Science Institute, The Scripps Research Institute and Scripps Health, La Jolla, CA 92037, USA *Correspondence: etopol@scripps.edu http://dx.doi.org/10.1016/j.cell.2014.02.012 That each of us is truly biologically unique, extending to even monozygotic, ‘‘identical’’ twins, is not fully appreciated. Now that it is possible to perform a comprehensive ‘‘omic’’ assessment of an individual, including one’s DNA and RNA sequence and at least some characterization of one’s proteome, metabolome, microbiome, autoantibodies, and epigenome, it has become abundantly clear that each of us has truly one-of-a-kind biological content. Well beyond the allure of the match- less fingerprint or snowflake concept, these singular, individual data and information set up a remarkable and unprecedented opportunity to improve medical treatment and develop preventive strategies to preserve health. From Digital to Biological to Individualized Medicine In 2010, Eric Schmidt of Google said ‘‘The power of individual targeting—the technology will be so good it will be very hard for people to watch or consume something that has not in some sense been tailored for them’’ (Jenkins, 2010). Although referring to the capability of digital technology, we have now reached a time of convergence of the digital and biologic do- mains. It has been well established that 0 and 1 are interchange- able with A, C, T, and G in books and Shakespeare sonnets and that DNA may represent the ultimate data storage system (Church et al., 2012; Goldman et al., 2013b). Biological transis- tors, also known as genetic logic gates, have now been devel- oped that make a computer from a living cell (Bonnet et al., 2013). The convergence of biology and technology was further captured by one of the protagonists of the digital era, Steve Jobs, who said ‘‘I think the biggest innovations of the 21st cen- tury will be at the intersection of biology and technology. A new era is beginning’’ (Issacson, 2011). With whole-genome DNA sequencing and a variety of omic technologies to define aspects of each individual’s biology at many different levels, we have indeed embarked on a new era of medicine. The term ‘‘personalized medicine’’ has been used for many years but has engendered considerable confusion. A recent survey indicated that only 4% of the public understand what the term is intended to mean (Stanton, 2013), and the hack- neyed, commercial use of ‘‘personalized’’ makes many people think that this refers to a concierge service of medical care. Whereas ‘‘person’’ refers to a human being, ‘‘personalized’’ can mean anything from having monogrammed stationary or luggage to ascribing personal qualities. Therefore, it was not surprising that a committee representing the National Academy of Sciences proposed using the term ‘‘precision medicine’’ as defined by ‘‘tailoring of medical treatment to the individual char- acteristics of each patient’’ (National Research Council, 2011). Although the term ‘‘precision’’ denotes the objective of exact- ness, ironically, it too can be viewed as ambiguous in this context because it does not capture the sense that the information is derived from the individual. For example, many laboratory tests could be made more precise by assay methodology, and treat- ments could be made more precise by avoiding side effects— without having anything to do with a specific individual. Other terms that have been suggested include genomic, digital, and stratified medicine, but all of these have a similar problem or appear to be too narrowly focused. The definition of individual is a single human being, derived from the Latin word individu, or indivisible. I propose individual- ized medicine as the preferred term because it has a useful double entendre. It relates not only to medicine that is particular- ized to a human being but also the future impact of digital technology on individuals driving their health care. There will increasingly be the flow of one’s biologic data and relevant medical information directly to the individual. Be it a genome sequence on a tablet or the results of a biosensor for blood pres- sure or another physiologic metric displayed on a smartphone, the digital convergence with biology will definitively anchor the individual as a source of salient data, the conduit of information flow, and a—if not the—principal driver of medicine in the future. The Human GIS Perhaps the most commonly used geographic information systems (GIS) are Google maps, which provide a layered approach to data visualization, such as viewing a location via satellite overlaid with street names, landmarks, and real-time traffic data. This GIS exemplifies the concept of gathering and transforming large bodies of data to provide exquisite temporal and location information. With the multiple virtual views, it gives one the sense of physically being on site. Although Google has digitized and thus created a GIS for the Earth, it is now possible to digitize a human being. As shown in Figure 1, there are multi- ple layers of data that can now be obtained for any individual. This includes data from biosensors, scanners, electronic medi- cal records, social media, and the various omics that include Cell 157, March 27, 2014 ª2014 Elsevier Inc. 241 countless hours of context to the digit DNA sequence, 2 T transcriptome, and first human GIS ca feat and yielded k individual. But, it ca at this juncture. With drop substantially, automating the anal ogy can readily be providing meaningfu The Omic Tools Whole-Genome an Perhaps the greates domain has been t sequence a human g the pace of Moore’sFigure 1. Geographic Information System of a Human Being
  • 56.
  • 57.
    근거 중심 의학에서근거 수준이 높아질수록 개별 환자보다는 그룹으로 추상화되는 경향
  • 58.
    개별 환자 대신,환자 집단의 분포로 표현되고 분포 간의 통계적 유의성을 찾는 것이 목적
  • 59.
    모든 가용한 다차원적데이터를 바탕으로, 개별 환자의 특성에 맞는 치료를 찾을 수 있다면
  • 60.
  • 61.
    Data-driven Medicine에 대한두 가지 전략 • top-down: 먼저 가설을 세우고, 그에 맞는 종류의 데이터를 모아서 검증해보자. • bottom-up: 일단 ‘모든’ 데이터를 최대한 많이 모아 놓으면, 뭐라도 큰 게 나오겠지.
  • 62.
    • top-down: 먼저가설을 세우고, 그에 맞는 종류의 데이터를 모아서 검증해보자. • bottom-up: 일단 ‘모든’ 데이터를 최대한 많이 모아 놓으면, 뭐라도 큰 게 나오겠지. Data-driven Medicine에 대한 두 가지 전략
  • 63.
    ©2017NatureAmerica,Inc.,partofSpringerNature.Allrightsreserved. NATURE BIOTECHNOLOGY ADVANCEONLINE PUBLICATION 1 A RT I C L E S In order to understand the basis of wellness and disease, we and others have pursued a global and holistic approach termed ‘systems medicine’1. The defining feature of systems medicine is the collec- tion of diverse longitudinal data for each individual. These data sets can be used to unravel the complexity of human biology and dis- ease by assessing both genetic and environmental determinants of health and their interactions. We refer to such data as personal, dense, dynamic data clouds: personal, because each data cloud is unique to an individual; dense, because of the high number of measurements; and dynamic, because we monitor longitudinally. The convergence of advances in systems medicine, big data analysis, individual meas- urement devices, and consumer-activated social networks has led to a vision of healthcare that is predictive, preventive, personalized, and participatory (P4)2, also known as ‘precision medicine’. Personal, dense, dynamic data clouds are indispensable to realizing this vision3. The US healthcare system invests 97% of its resources on disease care4, with little attention to wellness and disease prevention. Here we investigate scientific wellness, which we define as a quantitative data-informed approach to maintaining and improving health and avoiding disease. Several recent studies have illustrated the utility of multi-omic lon- gitudinal data to look for signs of reversible early disease or disease risk factors in single individuals. The dynamics of human gut and sali- vary microbiota in response to travel abroad and enteric infection was characterized in two individuals using daily stool and saliva samples5. Daily multi-omic data collection from one individual over 14 months identified signatures of respiratory infection and the onset of type 2 diabetes6. Crohn’s disease progression was tracked over many years in one individual using regular blood and stool measurements7. Each of these studies yielded insights into system dynamics even though they had only one or two participants. We report the generation and analysis of personal, dense, dynamic data clouds for 108 individuals over the course of a 9-month study that we call the Pioneer 100 Wellness Project (P100). Our study included whole genome sequences; clinical tests, metabolomes, proteomes, and microbiomes at 3-month intervals; and frequent activity measure- ments (i.e., wearing a Fitbit). This study takes a different approach from previous studies, in that a broad set of assays were carried out less frequently in a (comparatively) large number of people. Furthermore, we identified ‘actionable possibilities’ for each individual to enhance her/his health. Risk factors that we observed in participants’ clinical markers and genetics were used as a starting point to identify action- able possibilities for behavioral coaching. We report the correlations among different data types and identify population-level changes in clinical markers. This project is the pilot for the 100,000 (100K) person wellness project that we proposed in 2014 (ref. 8). An increased scale of personal, dense, dynamic data clouds in future holds the potential to improve our under- standing of scientific wellness and delineate early warning signs for human diseases. RESULTS The P100 study had four objectives. First, establish cost-efficient procedures for generating, storing, and analyzing multiple sources A wellness study of 108 individuals using personal, dense, dynamic data clouds Nathan D Price1,2,6,7, Andrew T Magis2,6, John C Earls2,6, Gustavo Glusman1 , Roie Levy1, Christopher Lausted1, Daniel T McDonald1,5, Ulrike Kusebauch1, Christopher L Moss1, Yong Zhou1, Shizhen Qin1, Robert L Moritz1 , Kristin Brogaard2, Gilbert S Omenn1,3, Jennifer C Lovejoy1,2 & Leroy Hood1,4,7 Personal data for 108 individuals were collected during a 9-month period, including whole genome sequences; clinical tests, metabolomes, proteomes, and microbiomes at three time points; and daily activity tracking. Using all of these data, we generated a correlation network that revealed communities of related analytes associated with physiology and disease. Connectivity within analyte communities enabled the identification of known and candidate biomarkers (e.g., gamma-glutamyltyrosine was densely interconnected with clinical analytes for cardiometabolic disease). We calculated polygenic scores from genome-wide association studies (GWAS) for 127 traits and diseases, and used these to discover molecular correlates of polygenic risk (e.g., genetic risk for inflammatory bowel disease was negatively correlated with plasma cystine). Finally, behavioral coaching informed by personal data helped participants to improve clinical biomarkers. Our results show that measurement of personal data clouds over time can improve our understanding of health and disease, including early transitions to disease states. 1Institute for Systems Biology, Seattle, Washington, USA. 2Arivale, Seattle, Washington, USA. 3Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA. 4Providence St. Joseph Health, Seattle, Washington, USA. 5Present address: University of California, San Diego, San Diego, California, USA. 6These authors contributed equally to this work. 7These authors jointly supervised this work. Correspondence should be addressed to N.D.P. (nathan.price@systemsbiology.org) or L.H. (lhood@systemsbiology.org). Received 16 October 2016; accepted 11 April 2017; published online 17 July 2017; doi:10.1038/nbt.3870
  • 64.
    Leroy Hood, MD,PhD (Institute for Systems Biology)
  • 66.
    Pioneer 100 WellnessProject (pilot of 100K person wellness project)
  • 67.
    NatureAmerica,Inc.,partofSpringerNature.Allrightsreserved. Intro a b Round 1 Coachingsessions Round 2 Coaching sessions Round 3 Coaching sessions Month 1 Month 2 Month 3 Month 4 Month 5 Month 6 Month 7 Month 8 Month 9 Clinical labs Cardiovascular HDL/LDL cholesterol, triglycerides, particle profiles, and other markers Blood sample Metabolomics Xenobiotics and metabolism-related small molecules Blood sample Diabetes risk Fasting glucose, HbA1c, insulin, and other markers Blood sample Inflammation IL-6, IL-8, and other markers Blood sample Nutrition and toxins Ferritin, vitamin D, glutathione, mercury, lead, and other markers Blood sample Genetics Whole genome sequence Blood sample Proteomics Inflammation, cardiovascular, liver, brain, and heart-related proteins Blood sample Gut microbiome 16S rRNA sequencing Stool sample Quantified self Daily activity Activity tracker Stress Four-point cortisol Saliva 모든 가용한 다차원적 데이터를 측정해보자
  • 68.
    ©2017NatureAmerica,Inc.,partofSpringerNature.Allrightsreserved. Proteomics Genetic traits Microbiome Coriobacteriia Allergic sensitization GH NEMO CD40L REN TPA HSP 27 LEP SIRT2 IL 6 FABP4 IL 1RA EGF VEGF A CSTB BETA NGF PPBP(2) PPBP NCF2 4E BP1 STAM PB SIRT2 CSF 1IL 6 FGF 21 IL 10RA IL 18R1IL8IL7 TNFSF14 CCL20 FLT3L CXCL10CD5HGFAXIN1 VEGFAOPGDNEROSM APCSINHBCCRP(2)CRPCFHR1HGFAC MBL2 SERPINC1 GC PTGDS ACTA2 ACTA2(2) PDGF SUBUNIT B Deletion Cfhr1 Inflammatory Bowel Disease Activated Partial Thromboplastin Time Bladder Cancer Bilirubin Levels Gamma Linolenic Acid Dihomo gamma Linolenic Acid Arachidonic Acid Linoleic Acid Adrenic Acid Deltaproteobacteria Mollicutes Verrucomicrobiae Coriobacteriales Verrucomicrobiales Verrucomicrobia Coriobacteriaceae 91otu13421 91otu4418 91otu1825 M ogibacteriaceae Unclassified Desulfovibrionaceae Pasteurellaceae Peptostreptococcaceae Christensenellaceae Verrucom icrobiaceae Alanine RatioOm6Om3 AlphaAminoN ButyricAcid Interleukinll6 SmallLdlParticle RatioGlnGln Threonine 3Methylhistidine AverageinflammationScore Mercury DocosapentaenoicAcidDocosatetraenoicAcid EicosadienoicAcidHomalrLeucineOmega3indexTyrosine HdlCholesterolCPeptide 1Methylhistidine 3HydroxyisovalericAcid IsovalerylglycineIsoleucine Figlu TotalCholesterolLinoleicDihomoYLinolejc PalmitoleicAcid ArachidonicAcid LdlParticle ArachidonicEicosapentaenoic Pasteurellales Diversity Tenericutes Clinical labs Metabolomics 5Hydroxyhexanoate Tl16:0(palmiticAcid) Tl18:3n6(gLinolenicAcid)Tl15:0(pentadecanoicAcid)Tl14:1n5(myristoleicAcid)Tl20:2n6(eicosadienoicAcid)Tl20:5n3(eicosapentaenoicAcid) Tl18:2n6(linoleicAcid) Tldm16:0(plasmalogenPalmiticAcid) Tl22:6n3(docosahexaenoicAcid) Tl22:4n6(adrenicAcid) Tl18:1n9(oleicAcid) Tldm18:1n9(plasmalogenOleicAcid) Tl20:4n6(arachidonicAcid) Tl14:0(myristicAcid) Arachidate(20:0) StearoylArachidonoylGlycerophosphoethanolamine(1)* 1Linoleoylglycerophosphocholine(18:2n6) StearoylLinoleoylGlycerophosphoethanolamine(1)* 1Palmitoleoylglycerophosphocholine(16:1)* PalmitoylOleoylGlycerophosphoglycerol(2)* PalmitoylLinoleoylGlycerophosphocholine(1)* Tl20:3n6(diHomoGLinoleicAcid) 2Hydroxypalmitate NervonoylSphingomyelin* Titl(totalTotalLipid) Cholesterol D ocosahexaenoate (dha;22;6n3) Eicosapentaenoate (epa; 20:5n3) 3 Carboxy 4 M ethyl 5 Propyl 2 Furanpropanoate (cm pf) 3 M ethyladipate Cholate Phosphoethanolamine 1 Oleoylglycerol (1 Monoolein) Tigloylglycine Valine sobutyrylglycine soleucine eucine P Cresol Glucuronide* Phenylacetylglutamine P Cresol Sulfate Tyrosine S Methylcysteine Cystine 3 Methylhistidine 1 Methylhistidine N Acetyltryptophan 3 Indoxyl Sulfate Serotonin (5ht) Creatinine Glutamate Cysteine Glutathione Disulfide Gamma Glutamylthreonine*Gamma Glutamylalanine Gamma Glutamylglutamate Gamma Glutamylglutamine Bradykinin, Hydroxy Pro(3) Bradykinin, Des Arg(9) BradykininMannoseBilirubin (e,e)* Biliverdin Bilirubin (z,z) L UrobilinNicotinamide Alpha TocopherolHippurate Cinnam oylglycine Ldl Particle N um ber Triglycerides Bilirubin Direct Alkaline Phosphatase EgfrNon AfrAm erican CholesterolTotal LdlSm all LdlM edium BilirubinTotal Ggt EgfrAfricanAmerican Cystine MargaricAcid ElaidicAcid Proinsulin Hba1c Insulin Triglycerides Ldlcholesterol DihomoGammaLinolenicAcid HsCrp GlutamicAcid Height Weight Leptin BodyMasIndex PhenylaceticAcid Valine TotalOmega3 TotalOmega6 HsCrpRelativeRisk DocosahexaenoicAcid AlphaAminoadipicAcid EicosapentaenoicAcid GammaAminobutyricAcid 5 Acetylam ino 6 Form ylam ino 3 M ethyluracil Adenosine 5 Monophosphate (amp) Gamma Glutamyltyrosine Gamma Glutamyl 2 Aminobutyrate N Acetyl 3 Methylhistidine* 3 Phenylpropionate (hydrocinnamate) Figure 2 Top 100 correlations per pair of data types. Subset of top statistically significant Spearman inter-omic cross-sectional correlations between all data sets collected in our cohort. Each line represents one correlation that was significant after adjustment for multiple hypothesis testing using the method of Benjamini and Hochberg10 at padj < 0.05. The mean of all three time points was used to compute the correlations between analytes. Up to 100 correlations per pair of data types are shown in this figure. See Supplementary Figure 1 and Supplementary Table 2 for the complete inter-omic cross-sectional network. Nature Biotechnology 2017 측정한 모든 종류의 데이터들 중에 가장 correlation이 높은 100개의 pair를 선정
  • 69.
    ©2017NatureAmerica,Inc.,partofSpringerNature.Allrightsreserved. A RT IC L E S edges. The majority of edges involved a metabolite (3,309) or a clini- cal laboratory test (3,366), with an additional 20 edges involving the 130 genetic traits tested, 46 with microbiome taxa or diversity score, and 207 with quantified proteins. The inter-omic delta correlation network contained 822 nodes and 2,406 edges. 375 of the edges in the delta correlation network were also present in the cross-sectional network. The cross-sectional correlation network is provided in Supplementary Table 2 (inter-omic only) and Supplementary Table 3 (full). The delta correlation network is provided in Supplementary Table 4 (inter-omic only) and Supplementary Table 5 (full). We identified clusters of related measurements from the cross- sectional inter-omic correlation network using community analysis, an unsupervised (i.e., using unlabeled data to find hidden structure) approach that iteratively prunes the network (removing the edges with the highest betweenness) to reveal densely inter- connected subgraphs (communities)11. Seventy communities of at least two vertices (mean of 10.9 V and 34.9 E) were identi- fied in the cross-sectional inter-omic network at the cutoff with maximum community modularity12 (Supplementary Fig. 2), and are fully visualized as an interactive graph in Cytoscape13 (Supplementary Dataset 1). 70% of the edges in the cross-sec- tional network remained after community edge pruning. The communities often represented a cluster of physiologically related analytes, as described below. Guanidinosuccinate Alanine IsovalerylcarnitineValine NAcetylleucineNAcetylisoleucine2Methylbutyrylcarnitine(c5) IsoleucineLeucine SAdenosylhomocysteine(sah) CysteineCystineMethionineSulfone NAcetyltryptophan NAcetylkynurenine(2) 3IndoxylSulfate Xanthurenate Kynurenine Kynurenate Tryptophan Phenylalanine N Acetylphenylalanine 4Hydroxyphenylpyruvate Phenylpyruvate Tyrosine N Acetyltyrosine Phenylacetylcarnitine G lutam ine G lutam ate N Acetylglycine G lycine Proline N Delta Acetylornithine N Acetylcitrulline Hom oarginine N2,n5 Diacetylornithine Pro Hydroxy Pro 2 Aminoadipate Lysine Deoxycholate Ursodeoxycholate Arachidate (20:0) Nonadecanoate (19:0) Palmitate (16:0) Erucate (22:1n9) Tl16:0 (palmitic Acid) Tl16:1n7 (palmitoleic Acid) Tl18:1n7 (avaccenic Acid) Tl14:1n5 (myristoleic Acid) Tl24:1n9 (nervonic Acid) Tldm18:1n7 (plasmalogen Vaccenic Acid) Tldm18:0 (plasmalogen Stearic Acid) Tl14:0 (myristic Acid) Tl18:2n6 (linoleic Acid) Tldm16:0 (plasmalogen Palmitic Acid) Tl22:1n9 (erucic Acid) Tl20:3n6 (di Homo G Linoleic Acid) Tl20:4n3 (eicosatetranoic Acid) Tl18:1n9 (oleic Acid) Tl18:3n3 (a Linolenic Acid) Tldm18:1n9 (plasmalogen Oleic Acid) 1 Linoleoylglycerophosphocholine (18:2n6) 1 Linolenoylglycerophosphocholine (18:3n3)* 2 Stearoylglycerophosphocholine*1 Palmitoleoylglycerophosphocholine (16:1)*1 Oleoylglycerophosphocholine (18:1)3 Hydroxylaurate2 Hydroxydecanoate3 Hydroxydecanoate 3 Hydroxyoctanoate 2 Hydroxystearate 3 Hydroxysebacate7 Alpha Hydroxy 3 Oxo 4 Cholestenoate (7 Hoca) CholesterolCarnitinePregnanediol 3 Glucuronide Epiandrosterone Sulfate Stearoylcarnitine Myristoleoylcarnitine* Decanoylcarnitine Laurylcarnitine 2 Oleoylglycerol (2 Monoolein) 1 Linolenoylglycerol 1 Palmitoylglycerol (1 Monopalmitin) 1 Linoleoylglycerol (1 Monolinolein) 1 Dihomo Linolenylglycerol (alpha, Gamma) 1 Oleoylglycerol (1 Monoolein) Caprate (10:0) Laurate (12:0) Caprylate (8:0) 5 Dodecenoate (12:1n7) Palm itoyl Sphingom yelin StearoylSphingom yelin Sphinganine NervonoylSphingom yelin* Sphingosine OleoylSphingom yelin 3 Hydroxybutyrate (bhba) Acetoacetate Butyrylcarnitine Propionylcarnitine DihomoLinolenate(20:3n3OrN6) Hexanoylglycine Glycerophosphoethanolamine Tltl(totalTotalLipid) Eicosanodioate Octadecanedioate 3Methyladipate 2MethylmalonylCarnitine PalmitoylEthanolamide NOleoyltaurine N1Methyl2Pyridone5Carboxamide Nicotinamide AlphaTocopherol GammaTocopherol Threonate Oxalate(ethanedioate) Ergothioneine NAcetylalliin Erythritol Cinnamoylglycine SAllylcysteine 2Pyrrolidinone 2Hydroxyisobutyrate Tartronate(hydroxymalonate) 1,3,7Trimethylurate 4Hydroxycoumarin 2AcetamidophenolSulfate 4AcetylphenolSulfate Mannose Erythronate* Pyruvate Lactate Glucose Glycerate Xylitol GammaGlutamylleucine GammaGlutamylphenylalanine Gam m a Glutam ylisoleucine* Gam m a Glutam ylglutam ine Gam m a Glutam ylhistidine G am m a G lutam ylglutam ate Bradykinin,Hydroxy Pro(3) G lycylleucine Succinylcarnitine Succinate Fum arate M alate Alpha Ketoglutarate Citrate Xanthine LDL Particle hs-CRP Relative Risk ProinsulinHba1cInsulin Gamma Linolenic Acid Triglycerides Manganese Dihomo Gamma Linolenic Acid Glutamic AcidLeptin Body Mass Index Total LC Omega9TryptophanLysineVitamin D 5 Hydroxyindoeacetic AcidWeightLactic Acid Linoleic Dihomo Y LinoleioIsovalerylglycineQuinolinic Acid C-PeptideHDL Cholesterol Indoleacetic Acid Adiponectin Phenylalanine Interleukin IL6 Small LDL Particle Ratio Asn Asp HOMA-IR Lignoceric Acid Succinic Acid Homogentisic Acid Homovanillic Acid Average Inflammation Score FIGLU Ratio Gln Gln Magnesium Pyroglutamic Acid Glucose Gondoic Acid Kynurenic Quinolinic Ratio Alpha Amino N Butyric Acid Tyrosine Alanine HDL Large GGT Triglycerides Bilirubin Direct LDL Medium LDL Pattern Alkaline Phosphatase LDL Peak Size Chloride Glucose LDL Particle Num ber LDL Sm all Ferritin CCL19 H G F IL 10RAIL 6 CXCL10 TNFSF14 CCL20CD5 CD40 VEGF A IL18R1OSM CRPF9(2) APCSINHBCCRP(2) MBL2(2)MBL2GC F9 SERPINC1 TPALEPVEGFAVEGFD IL6FABP4CSTBIL1RA Pasteurellales Pasteurellaceae Omega6FattyAcidLevels(DGLA) FG F 21 hs-CRP G am m a G lutam yltyrosine Amino acid metabolism Olink (CVD) Olink (inflammation) Quest diagnostics Genova diagnostics Nucleotides Energy Peptides Carbohydrates Xenobiotics Vitamins and cofactors Lipid metabolism SRM (liver) Metabolites Clinical labs Microbiome Genetic traits Proteins Figure 3 Cardiometabolic community. All vertices and edges of the cardiometabolic community, with lines indicating significant ( adj < 0.05) correlations. Associations with FGF21 (red lines) and gamma-glutamyltyrosine (purple lines) are highlighted. • inter-omics correlation network 의 분석을 통해서 환자들을 몇가지 cluster로 분류 • 가장 큰 cluster (246 Vertices, 1645 Edges): Cardiometaboic Health • four most connected clinical analyses: C-peptide, insulin, HOMA-IR, triglycerides • four most-connected proteins: leptin, C-reactive protein, FGF21, INHBC gamma-glutamyltyrosine FGF21
  • 70.
    • inter-omics correlationnetwork 의 분석을 통해서 환자들을 몇가지 cluster로 분류 • 가장 큰 cluster (246 Vertices, 1645 Edges): Cardiometaboic Health • four most connected clinical analyses: C-peptide, insulin, MOMA-IR, triglycerides • four most-connected proteins: leptin, C-reactive protein, FGF21, INHBC atureAmerica,Inc.,partofSpringerNature.Allrightsreserved. A RT I C L E S The largest community (246 V; 1,645 E) contains many clinical analytes associated with cardiometabolic health, such as C-peptide, triglycerides, insulin, homeostatic risk assessment–insulin resistance (HOMA-IR), fasting glucose, high-density lipid (HDL) cholesterol, and small low-density lipid (LDL) particle number (Fig. 3). The four most-connected clinical analytes by degree (the number of edges connecting a particular analyte) were C-peptide (degree 99), insulin (88), HOMA-IR (88), and triglycerides (75). The four most-connected proteins measured using targeted (i.e., selected reaction monitoring analysis) mass spectrometry or Olink proximity extension assays by degree are leptin (18), C-reactive protein (15), fibroblast growth factor 21 (FGF21) (14), and inhibin beta C chain (INHBC) (10). Leptin and C-reactive protein are indicators for cardiovascular risk14,15. FGF21 is positively correlated with the clinical analytes ( = −0.41; padj = 2.1 × 10−3). Hypothyroidism has long been recog- nized clinically as a cause of elevated cholesterol values19. A community formed around plasma serotonin (18 V; 25 E) contain- ing 12 proteins listed in Supplementary Table 6, for which the most significant enrichment identified in a STRING ontology analysis20 was platelet activation (padj = 1.7 × 10−3) (Fig. 4b). Serotonin is known to induce platelet aggregation21; accordingly, selective serotonin reuptake inhibitors (SSRIs) may protect against myocardial infarction22. We identified several communities containing microbiome taxa, suggesting that there are specific microbiome–analyte relationships. Hydrocinnamate, l-urobilin, and 5-hydroxyhexanoate clustered with the bacterial class Mollicutes and family Christensenellaceae (8 V; 8 E). Another community emerged around the Verrucomicrobiaceae and Desulfovibrionaceae families and p-cresol-sulfate (7 V; 6 E). The a c d b e Figure 4 Cholesterol, serotonin, -diversity, IBD, and bladder cancer communities. (a) Cholesterol community. (b) Serotonin community. (c) -diversity community. (d) The polygenic score for inflammatory bowel disease is negatively correlated with cystine. (e) The polygenic score for bladder cancer is positively correlated with 5-acetylamino-6-formylamino-3-methyluracil (AFMU). Cholesterol, serotonin, diversity, IBD, and bladder cancer communities. (a) Cholesterol community. (b) Serotonin community. (c) -diversity community. (d) The polygenic score for inflammatory bowel disease is negatively correlated with cystine. (e) The polygenic score for bladder cancer is positively correlated with 5-acetylamino-6-formylamino-3-methyluracil (AFMU).
  • 71.
    017NatureAmerica,Inc.,partofSpringerNature.Allrightsreserved. identified with elevatedfasting glucose or HbA1c at baseline (pre- diabetes), the coach made recommendations based on the Diabetes Prevention Program36, customized for each person’s lifestyle. These individual recommendations typically fell into one of several major factors (fasting insulin and HOMA-IR), and inflammation (IL-8 and TNF-alpha). Lipoprotein fractionation, performed by both labora- tory companies, produced significant but discordant results for LDL particle number. We observed significant improvements in fasting Table 1 Longitudinal analysis of clinical changes by round Clinical laboratory test Changes in labs in participants out-of-range at baseline Health area Name N per round P-value Nutrition Vitamin D 95 +7.2 ng/mL/round 7.1 × 10−25 Nutrition Mercury 81 −0.002 mcg/g/round 8.9 × 10−9 Diabetes HbA1c 52 −0.085%/round 9.2 × 10−6 Cardiovascular LDL particle number (Quest) 30 +130 nmol/L/round 9.3 × 10−5 Nutrition Methylmalonic acid (Genova) 3 −0.49 mmol/mol creatinine/round 2.1 × 10−4 Cardiovascular LDL pattern (A or B) 28 −0.16 /round 4.8 × 10−4 Inflammation Interleukin-8 10 −6.1 pg/mL/round 5.9 × 10−4 Cardiovascular Total cholesterol (Quest) 48 −6.4 mg/dL/round 7.2 × 10−4 Cardiovascular LDL cholesterol 57 −4.8 mg/dL/round 8.8 × 10−4 Cardiovascular LDL particle number (Genova) 70 −69 nmol/L/round 1.2 × 10−3 Cardiovascular Small LDL particle number (Genova) 73 −56 nmol/L/round 3.5 × 10−3 Diabetes Fasting glucose (Quest) 45 −1.9 mg/dL/round 8.2 × 10−3 Cardiovascular Total cholesterol (Genova) 43 −5.4 mg/dL/round 1.2 × 10−2 Diabetes Insulin 16 −2.3 IU/mL/round 1.5 × 10−2 Inflammation TNF-alpha 4 −6.6 pg/mL/round 1.8 × 10−2 Diabetes HOMA-IR 19 −0.56 /round 2.0 × 10−2 Cardiovascular HDL cholesterol 5 +4.5 mg/dL/round 2.2 × 10−2 Nutrition Methylmalonic acid (Quest) 7 −42 nmol/L/round 5.2 × 10−2 Cardiovascular Triglycerides (Genova) 14 −18 mg/dL/round 1.4 × 10−1 Diabetes Fasting glucose (Genova) 47 −0.98 mg/dL/round 1.5 × 10−1 Nutrition Arachidonic acid 35 +0.24 wt%/round 1.9 × 10−1 Inflammation hs-CRP 51 −0.47 mcg/mL/round 2.1 × 10−1 Cardiovascular Triglycerides (Quest) 17 −14 mg/dL/round 2.4 × 10−1 Nutrition Glutathione 6 +11 micromol/L/round 2.5 × 10−1 Nutrition Zinc 4 −0.82 mcg/g/round 3.0 × 10−1 Nutrition Ferritin 10 −14 ng/mL/round 3.1 × 10−1 Inflammation Interleukin-6 4 −1.1 pg/mL/round 3.8 × 10−1 Cardiovascular HDL large particle number 8 +210 nmol/L/round 4.9 × 10−1 Nutrition Copper 10 +0.006 mcg/g/round 6.0 × 10−1 Nutrition Selenium 6 +0.035 mcg/g/round 6.2 × 10−1 Cardiovascular Medium LDL particle number 20 +2.8 nmol/L/round 8.5 × 10−1 Cardiovascular Small LDL particle number (Quest) 14 −2.3 nmol/L/round 8.8 × 10−1 Nutrition Manganese 0 N/A N/A Nutrition EPA 0 N/A N/A Nutrition DHA 0 N/A N/A Generalized estimating equations (GEE) were used to calculate average changes in clinical laboratory tests over time, for those analytes that were actively coached on. The ‘ per round’ column is the average change in the population for that analyte by round adjusted for age, sex, and self-reported ancestry. ‘Out-of-range at baseline’ indicates the average change using only those participants who were out-of-range for that analyte at the beginning of the study. Rows in boldface indicate statistically significant improvement, while the italicized row indicates statistically significant worsening. N/A values are present where no participants were out-of-range at baseline. For example, the average improvement in vitamin D for the 95 participants that began the study out-of-range was +7.2 ng/mL per round. Several analytes are measured by both Quest and Genova; with the exception of LDL particle number, the direction of effect for significantly changed analytes was concordant across the two laboratories. An independence working correlation structure was used in the GEE. See Supplementary Table 10 for the complete results. • 수치가 정상 범위를 벗어나면 코치가 개입하여, 해당 수치를 개선할 수 있는 라이프스타일의 변화 유도 • 예를 들어, 공복혈당 혹은 HbA1c 의 증가: 코치가 당뇨 예방 프로그램(DPP)을 권고 • 몇개의 major category로 나눠짐 • diet, exercise, stress management, dietary supplements, physician referral • 이를 통해서 가장 크게 개선 효과가 있었던 수치들 • vitamin D, mercury, HbA1c • 전반적으로 콜레스테롤 관련 수치나, 당뇨 위험 관련 수치, 염증 수치 등에 개선이 있었음
  • 72.
    • 버릴리(구글)의 베이스라인프로젝트 • 건강과 질병을 새롭게 정의하기 위한 프로젝트 • 4년 동안 만 명의 개인의 건강 상태를 면밀하게 추적하여 데이터를 축적 • 심박수와 수면패턴 및 유전 정보, 감정 상태, 진료기록, 가족력, 소변/타액/혈액 검사 등
  • 73.
    • 버릴리(구글)의 베이스라인프로젝트 • 건강과 질병을 새롭게 정의하기 위한 프로젝트 • 4년 동안 만 명의 개인의 건강 상태를 면밀하게 추적하여 데이터를 축적 • 심박수와 수면패턴 및 유전 정보, 감정 상태, 진료기록, 가족력, 소변/타액/혈액 검사 등
  • 74.
    • 버릴리의 ‘StudyWatch’ • 2017년 4월 공개한 베이스라인 스터디 용 스마트워치 • 심전도, 심박수, EDA(Electrodermal Activity), 관성움직임(inertial movement) 등 측정 • 장기간 추적연구를 위한 특징들: 배터리 수명(일주일), 데이터 저장 공간, 동기화 (일주일 한 번)
  • 76.
    • Linda Avey의Precise.ly • 23andMe의 공동창업자였던 Linda Avey가 2009년 회사를 떠난 이후, 2011년 창업 • ‘We Are Curious’ 라는 이름에서 최근에 Precise.ly로 회사 이름 변경
  • 78.
    • Linda Avey의Precise.ly • Genotype + Phenotype + Microbiome + environment 모두 결합하여 의학적인 insight • Genotype: Helix의 플랫폼에서 WES 을 통하여 분석 • Phenotype: 웨어러블, IoT 기기를 이용
  • 79.
    • ‘Modern diseases’를주로 타게팅 하겠다고 언급하고 있음 • 예를 들어, autism spectrum syndrome을 다차원적 데이터를 기반으로 분류할 수 있을까? • Helix 플랫폼을 통해서 먼저 Chronic Fatigue 에 대한 앱을 먼저 출시하고, • 향후 autism, PD 등에 대한 앱을 출시할 예정이라고 함.
  • 80.
    iCarbonX •중국 BGI의 대표였던준왕이 창업 •'모든 데이터를 측정'하고 이를 정밀 의료에 활용할 계획 •데이터를 측정할 수 있는 역량을 가진 회사에 투자 및 인수 •SomaLogic, HealthTell, PatientsLikMe •향후 5년 동안 100만명-1000만 명의 데이터 모을 계획 •이 데이터의 분석은 인공지능으로
  • 81.
    현재 Arivale, BaselineProject, Precisely, iCarbon-X 가 모두 잘 되고 있지는 않으나, 이러한 변화의 초창기 시도 정도로 해석 가능
  • 82.
  • 83.
    그래서 그 많은데이터, 어떡할 건데?
  • 84.
    좋은 질문을 던저야,좋은 답이 나온다
  • 85.
    좋은 질문을 던저야,좋은 답이 나온다
  • 86.
    Martin Duggan,“IBM WatsonHealth - Integrated Care & the Evolution to Cognitive Computing” 지금 의대생과 전공의는 무엇을 배우나
  • 87.
    •복잡한 의료 데이터의분석 및 insight 도출 •영상 의료/병리 데이터의 분석/판독 •연속 데이터의 모니터링 및 예방/예측 의료 인공지능의 세 유형
  • 88.
    •복잡한 의료 데이터의분석 및 insight 도출 •영상 의료/병리 데이터의 분석/판독 •연속 데이터의 모니터링 및 예방/예측 의료 인공지능의 세 유형
  • 89.
    Jeopardy! 2011년 인간 챔피언두 명 과 퀴즈 대결을 벌여서 압도적인 우승을 차지
  • 93.
    ORIGINAL ARTICLE Watson forOncology and breast cancer treatment recommendations: agreement with an expert multidisciplinary tumor board S. P. Somashekhar1*, M.-J. Sepu´lveda2 , S. Puglielli3 , A. D. Norden3 , E. H. Shortliffe4 , C. Rohit Kumar1 , A. Rauthan1 , N. Arun Kumar1 , P. Patil1 , K. Rhee3 & Y. Ramya1 1 Manipal Comprehensive Cancer Centre, Manipal Hospital, Bangalore, India; 2 IBM Research (Retired), Yorktown Heights; 3 Watson Health, IBM Corporation, Cambridge; 4 Department of Surgical Oncology, College of Health Solutions, Arizona State University, Phoenix, USA *Correspondence to: Prof. Sampige Prasannakumar Somashekhar, Manipal Comprehensive Cancer Centre, Manipal Hospital, Old Airport Road, Bangalore 560017, Karnataka, India. Tel: þ91-9845712012; Fax: þ91-80-2502-3759; E-mail: somashekhar.sp@manipalhospitals.com Background: Breast cancer oncologists are challenged to personalize care with rapidly changing scientific evidence, drug approvals, and treatment guidelines. Artificial intelligence (AI) clinical decision-support systems (CDSSs) have the potential to help address this challenge. We report here the results of examining the level of agreement (concordance) between treatment recommendations made by the AI CDSS Watson for Oncology (WFO) and a multidisciplinary tumor board for breast cancer. Patients and methods: Treatment recommendations were provided for 638 breast cancers between 2014 and 2016 at the Manipal Comprehensive Cancer Center, Bengaluru, India. WFO provided treatment recommendations for the identical cases in 2016. A blinded second review was carried out by the center’s tumor board in 2016 for all cases in which there was not agreement, to account for treatments and guidelines not available before 2016. Treatment recommendations were considered concordant if the tumor board recommendations were designated ‘recommended’ or ‘for consideration’ by WFO. Results: Treatment concordance between WFO and the multidisciplinary tumor board occurred in 93% of breast cancer cases. Subgroup analysis found that patients with stage I or IV disease were less likely to be concordant than patients with stage II or III disease. Increasing age was found to have a major impact on concordance. Concordance declined significantly (P 0.02; P < 0.001) in all age groups compared with patients <45 years of age, except for the age group 55–64 years. Receptor status was not found to affect concordance. Conclusion: Treatment recommendations made by WFO and the tumor board were highly concordant for breast cancer cases examined. Breast cancer stage and patient age had significant influence on concordance, while receptor status alone did not. This study demonstrates that the AI clinical decision-support system WFO may be a helpful tool for breast cancer treatment decision making, especially at centers where expert breast cancer resources are limited. Key words: Watson for Oncology, artificial intelligence, cognitive clinical decision-support systems, breast cancer, concordance, multidisciplinary tumor board Introduction Oncologists who treat breast cancer are challenged by a large and rapidly expanding knowledge base [1, 2]. As of October 2017, for example, there were 69 FDA-approved drugs for the treatment of breast cancer, not including combination treatment regimens [3]. The growth of massive genetic and clinical databases, along with computing systems to exploit them, will accelerate the speed of breast cancer treatment advances and shorten the cycle time for changes to breast cancer treatment guidelines [4, 5]. In add- ition, these information management challenges in cancer care are occurring in a practice environment where there is little time available for tracking and accessing relevant information at the point of care [6]. For example, a study that surveyed 1117 oncolo- gists reported that on average 4.6 h per week were spent keeping VC The Author(s) 2018. Published by Oxford University Press on behalf of the European Society for Medical Oncology. All rights reserved. For permissions, please email: journals.permissions@oup.com. Annals of Oncology 29: 418–423, 2018 doi:10.1093/annonc/mdx781 Published online 9 January 2018 Downloaded from https://academic.oup.com/annonc/article-abstract/29/2/418/4781689 by guest WFO는 현재 정확성, 효용성에 대한 근거가 부족하지만, 10년 뒤에도 그러할까?
  • 94.
    ORIGINAL ARTICLE Watson forOncology and breast cancer treatment recommendations: agreement with an expert multidisciplinary tumor board S. P. Somashekhar1*, M.-J. Sepu´lveda2 , S. Puglielli3 , A. D. Norden3 , E. H. Shortliffe4 , C. Rohit Kumar1 , A. Rauthan1 , N. Arun Kumar1 , P. Patil1 , K. Rhee3 & Y. Ramya1 1 Manipal Comprehensive Cancer Centre, Manipal Hospital, Bangalore, India; 2 IBM Research (Retired), Yorktown Heights; 3 Watson Health, IBM Corporation, Cambridge; 4 Department of Surgical Oncology, College of Health Solutions, Arizona State University, Phoenix, USA *Correspondence to: Prof. Sampige Prasannakumar Somashekhar, Manipal Comprehensive Cancer Centre, Manipal Hospital, Old Airport Road, Bangalore 560017, Karnataka, India. Tel: þ91-9845712012; Fax: þ91-80-2502-3759; E-mail: somashekhar.sp@manipalhospitals.com Background: Breast cancer oncologists are challenged to personalize care with rapidly changing scientific evidence, drug approvals, and treatment guidelines. Artificial intelligence (AI) clinical decision-support systems (CDSSs) have the potential to help address this challenge. We report here the results of examining the level of agreement (concordance) between treatment recommendations made by the AI CDSS Watson for Oncology (WFO) and a multidisciplinary tumor board for breast cancer. Patients and methods: Treatment recommendations were provided for 638 breast cancers between 2014 and 2016 at the Manipal Comprehensive Cancer Center, Bengaluru, India. WFO provided treatment recommendations for the identical cases in 2016. A blinded second review was carried out by the center’s tumor board in 2016 for all cases in which there was not agreement, to account for treatments and guidelines not available before 2016. Treatment recommendations were considered concordant if the tumor board recommendations were designated ‘recommended’ or ‘for consideration’ by WFO. Results: Treatment concordance between WFO and the multidisciplinary tumor board occurred in 93% of breast cancer cases. Subgroup analysis found that patients with stage I or IV disease were less likely to be concordant than patients with stage II or III disease. Increasing age was found to have a major impact on concordance. Concordance declined significantly (P 0.02; P < 0.001) in all age groups compared with patients <45 years of age, except for the age group 55–64 years. Receptor status was not found to affect concordance. Conclusion: Treatment recommendations made by WFO and the tumor board were highly concordant for breast cancer cases examined. Breast cancer stage and patient age had significant influence on concordance, while receptor status alone did not. This study demonstrates that the AI clinical decision-support system WFO may be a helpful tool for breast cancer treatment decision making, especially at centers where expert breast cancer resources are limited. Key words: Watson for Oncology, artificial intelligence, cognitive clinical decision-support systems, breast cancer, concordance, multidisciplinary tumor board Introduction Oncologists who treat breast cancer are challenged by a large and rapidly expanding knowledge base [1, 2]. As of October 2017, for example, there were 69 FDA-approved drugs for the treatment of breast cancer, not including combination treatment regimens [3]. The growth of massive genetic and clinical databases, along with computing systems to exploit them, will accelerate the speed of breast cancer treatment advances and shorten the cycle time for changes to breast cancer treatment guidelines [4, 5]. In add- ition, these information management challenges in cancer care are occurring in a practice environment where there is little time available for tracking and accessing relevant information at the point of care [6]. For example, a study that surveyed 1117 oncolo- gists reported that on average 4.6 h per week were spent keeping VC The Author(s) 2018. Published by Oxford University Press on behalf of the European Society for Medical Oncology. All rights reserved. For permissions, please email: journals.permissions@oup.com. Annals of Oncology 29: 418–423, 2018 doi:10.1093/annonc/mdx781 Published online 9 January 2018 Downloaded from https://academic.oup.com/annonc/article-abstract/29/2/418/4781689 by guest Table 2. MMDT and WFO recommendations after the initial and blinded second reviews Review of breast cancer cases (N 5 638) Concordant cases, n (%) Non-concordant cases, n (%) Recommended For consideration Total Not recommended Not available Total Initial review (T1MMDT versus T2WFO) 296 (46) 167 (26) 463 (73) 137 (21) 38 (6) 175 (27) Second review (T2MMDT versus T2WFO) 397 (62) 194 (30) 591 (93) 36 (5) 11 (2) 47 (7) T1MMDT, original MMDT recommendation from 2014 to 2016; T2WFO, WFO advisor treatment recommendation in 2016; T2MMDT, MMDT treatment recom- mendation in 2016; MMDT, Manipal multidisciplinary tumor board; WFO, Watson for Oncology. 31% 18% 1% 2% 33% 5% 31% 6% 0% 10% 20% Not available Not recommended RecommendedFor consideration 30% 40% 50% 60% 70% 80% 90% 100% 8% 25% 61% 64% 64% 29% 51% 62% Concordance, 93% Concordance, 80% Concordance, 97% Concordance, 95% Concordance, 86% 2% 2% Overall (n=638) Stage I (n=61) Stage II (n=262) Stage III (n=191) Stage IV (n=124) 5% Figure 1. Treatment concordance between WFO and the MMDT overall and by stage. MMDT, Manipal multidisciplinary tumor board; WFO, Watson for Oncology. 5%Non-metastatic HR(+)HER2/neu(+)Triple(–) Metastatic Non-metastatic Metastatic Non-metastatic Metastatic 10% 1% 2% 1% 5% 20% 20%10% 0% Not applicable Not recommended For consideration Recommended 20% 40% 60% 80% 100% 5% 74% 65% 34% 64% 5% 38% 56% 15% 20% 55% 36% 59% Concordance, 95% Concordance, 75% Concordance, 94% Concordance, 98% Concordance, 94% Concordance, 85% Figure 2. Treatment concordance between WFO and the MMDT by stage and receptor status. HER2/neu, human epidermal growth factor receptor 2; HR, hormone receptor; MMDT, Manipal multidisciplinary tumor board; WFO, Watson for Oncology. Annals of Oncology Original article WFO는 현재 정확성, 효용성에 대한 근거가 부족하지만, 10년 뒤에도 그러할까?
  • 95.
    IBM Watson Health Watsonfor Clinical Trial Matching (CTM) 18 1. According to the National Comprehensive Cancer Network (NCCN) 2. http://csdd.tufts.edu/files/uploads/02_-_jan_15,_2013_-_recruitment-retention.pdf© 2015 International Business Machines Corporation Searching across eligibility criteria of clinical trials is time consuming and labor intensive Current Challenges Fewer than 5% of adult cancer patients participate in clinical trials1 37% of sites fail to meet minimum enrollment targets. 11% of sites fail to enroll a single patient 2 The Watson solution • Uses structured and unstructured patient data to quickly check eligibility across relevant clinical trials • Provides eligible trial considerations ranked by relevance • Increases speed to qualify patients Clinical Investigators (Opportunity) • Trials to Patient: Perform feasibility analysis for a trial • Identify sites with most potential for patient enrollment • Optimize inclusion/exclusion criteria in protocols Faster, more efficient recruitment strategies, better designed protocols Point of Care (Offering) • Patient to Trials: Quickly find the right trial that a patient might be eligible for amongst 100s of open trials available Improve patient care quality, consistency, increased efficiencyIBM Confidential
  • 96.
    •“향후 10년 동안첫번째 cardiovascular event 가 올 것인가” 예측 •전향적 코호트 스터디: 영국 환자 378,256 명 •일상적 의료 데이터를 바탕으로 기계학습으로 질병을 예측하는 첫번째 대규모 스터디 •기존의 ACC/AHA 가이드라인과 4가지 기계학습 알고리즘의 정확도를 비교 •Random forest; Logistic regression; Gradient bossting; Neural network
  • 97.
    •2018년 1월 구글이전자의무기록(EMR)을 분석하여, 환자 치료 결과를 예측하는 인공지능 발표 •환자가 입원 중에 사망할 것인지 •장기간 입원할 것인지 •퇴원 후에 30일 내에 재입원할 것인지 •퇴원 시의 진단명
 •이번 연구의 특징: 확장성 •과거 다른 연구와 달리 EMR의 일부 데이터를 pre-processing 하지 않고, •전체 EMR 를 통채로 모두 분석하였음: UCSF, UCM (시카고 대학병원) •특히, 비정형 데이터인 의사의 진료 노트도 분석
  • 98.
    ARTICLE OPEN Scalable andaccurate deep learning with electronic health records Alvin Rajkomar 1,2 , Eyal Oren1 , Kai Chen1 , Andrew M. Dai1 , Nissan Hajaj1 , Michaela Hardt1 , Peter J. Liu1 , Xiaobing Liu1 , Jake Marcus1 , Mimi Sun1 , Patrik Sundberg1 , Hector Yee1 , Kun Zhang1 , Yi Zhang1 , Gerardo Flores1 , Gavin E. Duggan1 , Jamie Irvine1 , Quoc Le1 , Kurt Litsch1 , Alexander Mossin1 , Justin Tansuwan1 , De Wang1 , James Wexler1 , Jimbo Wilson1 , Dana Ludwig2 , Samuel L. Volchenboum3 , Katherine Chou1 , Michael Pearson1 , Srinivasan Madabushi1 , Nigam H. Shah4 , Atul J. Butte2 , Michael D. Howell1 , Claire Cui1 , Greg S. Corrado1 and Jeffrey Dean1 Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient’s record. We propose a representation of patients’ entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two US academic medical centers with 216,221 adult patients hospitalized for at least 24 h. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting: in-hospital mortality (area under the receiver operator curve [AUROC] across sites 0.93–0.94), 30-day unplanned readmission (AUROC 0.75–0.76), prolonged length of stay (AUROC 0.85–0.86), and all of a patient’s final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed traditional, clinically-used predictive models in all cases. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios. In a case study of a particular prediction, we demonstrate that neural networks can be used to identify relevant information from the patient’s chart. npj Digital Medicine (2018)1:18 ; doi:10.1038/s41746-018-0029-1 INTRODUCTION The promise of digital medicine stems in part from the hope that, by digitizing health data, we might more easily leverage computer information systems to understand and improve care. In fact, routinely collected patient healthcare data are now approaching the genomic scale in volume and complexity.1 Unfortunately, most of this information is not yet used in the sorts of predictive statistical models clinicians might use to improve care delivery. It is widely suspected that use of such efforts, if successful, could provide major benefits not only for patient safety and quality but also in reducing healthcare costs.2–6 In spite of the richness and potential of available data, scaling the development of predictive models is difficult because, for traditional predictive modeling techniques, each outcome to be predicted requires the creation of a custom dataset with specific variables.7 It is widely held that 80% of the effort in an analytic model is preprocessing, merging, customizing, and cleaning nurses, and other providers are included. Traditional modeling approaches have dealt with this complexity simply by choosing a very limited number of commonly collected variables to consider.7 This is problematic because the resulting models may produce imprecise predictions: false-positive predictions can overwhelm physicians, nurses, and other providers with false alarms and concomitant alert fatigue,10 which the Joint Commission identified as a national patient safety priority in 2014.11 False-negative predictions can miss significant numbers of clinically important events, leading to poor clinical outcomes.11,12 Incorporating the entire EHR, including clinicians’ free-text notes, offers some hope of overcoming these shortcomings but is unwieldy for most predictive modeling techniques. Recent developments in deep learning and artificial neural networks may allow us to address many of these challenges and unlock the information in the EHR. Deep learning emerged as the preferred machine learning approach in machine perception www.nature.com/npjdigitalmed •2018년 1월 구글이 전자의무기록(EMR)을 분석하여, 환자 치료 결과를 예측하는 인공지능 발표 •환자가 입원 중에 사망할 것인지 •장기간 입원할 것인지 •퇴원 후에 30일 내에 재입원할 것인지 •퇴원 시의 진단명
 •이번 연구의 특징: 확장성 •과거 다른 연구와 달리 EMR의 일부 데이터를 pre-processing 하지 않고, •전체 EMR 를 통채로 모두 분석하였음: UCSF, UCM (시카고 대학병원) •특히, 비정형 데이터인 의사의 진료 노트도 분석
  • 99.
    • 복잡한 의료데이터의 분석 및 insight 도출 • 영상 의료/병리 데이터의 분석/판독 • 연속 데이터의 모니터링 및 예방/예측 의료 인공지능의 세 유형
  • 100.
  • 101.
  • 102.
  • 103.
  • 104.
  • 107.
  • 108.
    Fig 1. Whatcan consumer wearables do? Heart rate can be measured with an oximeter built into a ring [3], muscle activity with an electromyographi sensor embedded into clothing [4], stress with an electodermal sensor incorporated into a wristband [5], and physical activity or sleep patterns via an accelerometer in a watch [6,7]. In addition, a female’s most fertile period can be identified with detailed body temperature tracking [8], while levels of me attention can be monitored with a small number of non-gelled electroencephalogram (EEG) electrodes [9]. Levels of social interaction (also known to a PLOS Medicine 2016
  • 109.
    • 복잡한 의료데이터의 분석 및 insight 도출 • 영상 의료/병리 데이터의 분석/판독 • 연속 데이터의 모니터링 및 예방/예측 인공지능의 의료 활용
  • 110.
  • 112.
    Sugar.IQ 사용자의 음식 섭취와그에 따른 혈당 변화, 인슐린 주입 등의 과거 기록 기반 식후 사용자의 혈당이 어떻게 변화할지 Watson 이 예측
  • 114.
    An Algorithm Basedon Deep Learning for Predicting In-Hospital Cardiac Arrest Joon-myoung Kwon, MD;* Youngnam Lee, MS;* Yeha Lee, PhD; Seungwoo Lee, BS; Jinsik Park, MD, PhD Background-—In-hospital cardiac arrest is a major burden to public health, which affects patient safety. Although traditional track- and-trigger systems are used to predict cardiac arrest early, they have limitations, with low sensitivity and high false-alarm rates. We propose a deep learning–based early warning system that shows higher performance than the existing track-and-trigger systems. Methods and Results-—This retrospective cohort study reviewed patients who were admitted to 2 hospitals from June 2010 to July 2017. A total of 52 131 patients were included. Specifically, a recurrent neural network was trained using data from June 2010 to January 2017. The result was tested using the data from February to July 2017. The primary outcome was cardiac arrest, and the secondary outcome was death without attempted resuscitation. As comparative measures, we used the area under the receiver operating characteristic curve (AUROC), the area under the precision–recall curve (AUPRC), and the net reclassification index. Furthermore, we evaluated sensitivity while varying the number of alarms. The deep learning–based early warning system (AUROC: 0.850; AUPRC: 0.044) significantly outperformed a modified early warning score (AUROC: 0.603; AUPRC: 0.003), a random forest algorithm (AUROC: 0.780; AUPRC: 0.014), and logistic regression (AUROC: 0.613; AUPRC: 0.007). Furthermore, the deep learning– based early warning system reduced the number of alarms by 82.2%, 13.5%, and 42.1% compared with the modified early warning system, random forest, and logistic regression, respectively, at the same sensitivity. Conclusions-—An algorithm based on deep learning had high sensitivity and a low false-alarm rate for detection of patients with cardiac arrest in the multicenter study. (J Am Heart Assoc. 2018;7:e008678. DOI: 10.1161/JAHA.118.008678.) Key Words: artificial intelligence • cardiac arrest • deep learning • machine learning • rapid response system • resuscitation In-hospital cardiac arrest is a major burden to public health, which affects patient safety.1–3 More than a half of cardiac arrests result from respiratory failure or hypovolemic shock, and 80% of patients with cardiac arrest show signs of deterioration in the 8 hours before cardiac arrest.4–9 However, 209 000 in-hospital cardiac arrests occur in the United States each year, and the survival discharge rate for patients with cardiac arrest is <20% worldwide.10,11 Rapid response systems (RRSs) have been introduced in many hospitals to detect cardiac arrest using the track-and-trigger system (TTS).12,13 Two types of TTS are used in RRSs. For the single-parameter TTS (SPTTS), cardiac arrest is predicted if any single vital sign (eg, heart rate [HR], blood pressure) is out of the normal range.14 The aggregated weighted TTS calculates a weighted score for each vital sign and then finds patients with cardiac arrest based on the sum of these scores.15 The modified early warning score (MEWS) is one of the most widely used approaches among all aggregated weighted TTSs (Table 1)16 ; however, traditional TTSs including MEWS have limitations, with low sensitivity or high false-alarm rates.14,15,17 Sensitivity and false-alarm rate interact: Increased sensitivity creates higher false-alarm rates and vice versa. Current RRSs suffer from low sensitivity or a high false- alarm rate. An RRS was used for only 30% of patients before unplanned intensive care unit admission and was not used for 22.8% of patients, even if they met the criteria.18,19 From the Departments of Emergency Medicine (J.-m.K.) and Cardiology (J.P.), Mediplex Sejong Hospital, Incheon, Korea; VUNO, Seoul, Korea (Youngnam L., Yeha L., S.L.). *Dr Kwon and Mr Youngnam Lee contributed equally to this study. Correspondence to: Joon-myoung Kwon, MD, Department of Emergency medicine, Mediplex Sejong Hospital, 20, Gyeyangmunhwa-ro, Gyeyang-gu, Incheon 21080, Korea. E-mail: kwonjm@sejongh.co.kr Received January 18, 2018; accepted May 31, 2018. ª 2018 The Authors. Published on behalf of the American Heart Association, Inc., by Wiley. This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes. DOI: 10.1161/JAHA.118.008678 Journal of the American Heart Association 1 ORIGINAL RESEARCH byguestonJune28,2018http://jaha.ahajournals.org/Downloadedfrom
  • 115.
    •환자 수: 86,290 •cardiacarrest: 633 •Input: Heart rate, Respiratory rate, Body temperature, Systolic Blood Pressure (source: VUNO) Cardiac Arrest Prediction Accuracy
  • 116.
    •대학병원 신속 대응팀에서처리 가능한 알림 수 (A, B 지점) 에서 더 큰 정확도 차이를 보임 •A: DEWS 33.0%, MEWS 0.3% •B: DEWS 42.7%, MEWS 4.0% (source: VUNO) APPH(Alarms Per Patients Per Hour) (source: VUNO) Less False Alarm
  • 117.
  • 118.
    Cardiogram •실리콘밸리의 Cardiogram 은애플워치로 측정한 심박수 데이터를 바탕으로 서비스 •2016년 10월 Andressen Horowitz 에서 $2m의 투자 유치
  • 119.
    Passive Detection ofAtrial Fibrillation Using a Commercially Available Smartwatch Geoffrey H. Tison, MD, MPH; José M. Sanchez, MD; Brandon Ballinger, BS; Avesh Singh, MS; Jeffrey E. Olgin, MD; Mark J. Pletcher, MD, MPH; Eric Vittinghoff, PhD; Emily S. Lee, BA; Shannon M. Fan, BA; Rachel A. Gladstone, BA; Carlos Mikell, BS; Nimit Sohoni, BS; Johnson Hsieh, MS; Gregory M. Marcus, MD, MAS IMPORTANCE Atrial fibrillation (AF) affects 34 million people worldwide and is a leading cause of stroke. A readily accessible means to continuously monitor for AF could prevent large numbers of strokes and death. OBJECTIVE To develop and validate a deep neural network to detect AF using smartwatch data. DESIGN, SETTING, AND PARTICIPANTS In this multinational cardiovascular remote cohort study coordinated at the University of California, San Francisco, smartwatches were used to obtain heart rate and step count data for algorithm development. A total of 9750 participants enrolled in the Health eHeart Study and 51 patients undergoing cardioversion at the University of California, San Francisco, were enrolled between February 2016 and March 2017. A deep neural network was trained using a method called heuristic pretraining in which the network approximated representations of the R-R interval (ie, time between heartbeats) without manual labeling of training data. Validation was performed against the reference standard 12-lead electrocardiography (ECG) in a separate cohort of patients undergoing cardioversion. A second exploratory validation was performed using smartwatch data from ambulatory individuals against the reference standard of self-reported history of persistent AF. Data were analyzed from March 2017 to September 2017. MAIN OUTCOMES AND MEASURES The sensitivity, specificity, and receiver operating characteristic C statistic for the algorithm to detect AF were generated based on the reference standard of 12-lead ECG–diagnosed AF. RESULTS Of the 9750 participants enrolled in the remote cohort, including 347 participants with AF, 6143 (63.0%) were male, and the mean (SD) age was 42 (12) years. There were more than 139 million heart rate measurements on which the deep neural network was trained. The deep neural network exhibited a C statistic of 0.97 (95% CI, 0.94-1.00; P < .001) to detect AF against the reference standard 12-lead ECG–diagnosed AF in the external validation cohort of 51 patients undergoing cardioversion; sensitivity was 98.0% and specificity was 90.2%. In an exploratory analysis relying on self-report of persistent AF in ambulatory participants, the C statistic was 0.72 (95% CI, 0.64-0.78); sensitivity was 67.7% and specificity was 67.6%. CONCLUSIONS AND RELEVANCE This proof-of-concept study found that smartwatch photoplethysmography coupled with a deep neural network can passively detect AF but with some loss of sensitivity and specificity against a criterion-standard ECG. Further studies will help identify the optimal role for smartwatch-guided rhythm assessment. JAMA Cardiol. doi:10.1001/jamacardio.2018.0136 Published online March 21, 2018. Editorial Supplemental content and Audio Author Affiliations: Division of Cardiology, Department of Medicine, University of California, San Francisco (Tison, Sanchez, Olgin, Lee, Fan, Gladstone, Mikell, Marcus); Cardiogram Incorporated, San Francisco, California (Ballinger, Singh, Sohoni, Hsieh); Department of Epidemiology and Biostatistics, University of California, San Francisco (Pletcher, Vittinghoff). Corresponding Author: Gregory M. Marcus, MD, MAS, Division of Cardiology, Department of Medicine, University of California, San Francisco, 505 Parnassus Ave, M1180B, San Francisco, CA 94143- 0124 (marcusg@medicine.ucsf.edu). Research JAMA Cardiology | Original Investigation (Reprinted) E1 © 2018 American Medical Association. All rights reserved.
  • 120.
    • ZIO Patch •2009년에 FDA에서 인허가 받은 의료기기 • 최대 2주까지 붙이고 다니면서 지속적으로 심전도를 측정
  • 121.
    Cardiologist-Level Arrhythmia Detection withConvolutional Neural Networks Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks Seq Set Model Cardiol. Model Cardiol. Class-level F1 Score AFIB 0.604 0.515 0.667 0.544 AFL 0.687 0.635 0.679 0.646 AVB TYPE2 0.689 0.535 0.656 0.529 BIGEMINY 0.897 0.837 0.870 0.849 CHB 0.843 0.701 0.852 0.685 EAR 0.519 0.476 0.571 0.529 IVR 0.761 0.632 0.774 0.720 JUNCTIONAL 0.670 0.684 0.783 0.674 NOISE 0.823 0.768 0.704 0.689 SINUS 0.879 0.847 0.939 0.907 SVT 0.477 0.449 0.658 0.556 TRIGEMINY 0.908 0.843 0.870 0.816 VT 0.506 0.566 0.694 0.769 WENCKEBACH 0.709 0.593 0.806 0.736 Aggregate Results Precision (PPV) 0.800 0.723 0.809 0.763 Recall (Sensitivity) 0.784 0.724 0.827 0.744 F1 0.776 0.719 0.809 0.751 Table 1. The top part of the table gives a class-level comparison of the expert to the model F1 score for both the Sequence and the Set In both the Sequence and the Set ca F1 score for each class separately. W overall F1 (and precision and recall) a weighted mean. Model vs. Cardiologist Performance We assess the cardiologist performanc call that each of the records in the t truth label from a committee of three as individual labels from a disjoint set gists. To assess cardiologist performan take the average of all the individual c using the group label as the ground tru Table 1 shows the breakdown of bo model scores across the different rh model outperforms the average cardi on most rhythms, noticeably outperfo gists in the AV Block set of arrhythm Mobitz I (Wenckebach), Mobitz II (AV plete heart block (CHB). This is esp the severity of Mobitz II and complete importance of distinguishing these tw
  • 122.
    인공지능, 어떻게 활용할것인가 (1) 인공지능+의사 시너지 증명
  • 123.
    •손 엑스레이 영상을판독하여 환자의 골연령 (뼈 나이)를 계산해주는 인공지능 • 기존에 의사는 그룰리히-파일(Greulich-Pyle)법 등으로 표준 사진과 엑스레이를 비교하여 판독 • 인공지능은 참조표준영상에서 성별/나이별 패턴을 찾아서 유사성을 확률로 표시 + 표준 영상 검색 •의사가 성조숙증이나 저성장을 진단하는데 도움을 줄 수 있음
  • 124.
    AJR:209, December 20171 Since 1992, concerns regarding interob- server variability in manual bone age esti- mation [4] have led to the establishment of several automatic computerized methods for bone age estimation, including computer-as- sisted skeletal age scores, computer-aided skeletal maturation assessment systems, and BoneXpert (Visiana) [5–14]. BoneXpert was developed according to traditional machine- learning techniques and has been shown to have a good performance for patients of var- ious ethnicities and in various clinical set- tings [10–14]. The deep-learning technique is an improvement in artificial neural net- works. Unlike traditional machine-learning techniques, deep-learning techniques allow an algorithm to program itself by learning from the images given a large dataset of la- beled examples, thus removing the need to specify rules [15]. Deep-learning techniques permit higher levels of abstraction and improved predic- tions from data. Deep-learning techniques Computerized Bone Age Estimation Using Deep Learning– Based Program: Evaluation of the Accuracy and Efficiency Jeong Rye Kim1 Woo Hyun Shim1 Hee Mang Yoon1 Sang Hyup Hong1 Jin Seong Lee1 Young Ah Cho1 Sangki Kim2 Kim JR, Shim WH, Yoon MH, et al. 1 Department of Radiology and Research Institute of Radiology, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-ro 43-gil, Songpa-gu, Seoul 05505, South Korea. Address correspondence to H. M. Yoon (espoirhm@gmail.com). 2 Vuno Research Center, Vuno Inc., Seoul, South Korea. Pediatric Imaging • Original Research Supplemental Data Available online at www.ajronline.org. AJR 2017; 209:1–7 0361–803X/17/2096–1 © American Roentgen Ray Society B one age estimation is crucial for developmental status determina- tions and ultimate height predic- tions in the pediatric population, particularly for patients with growth disor- ders and endocrine abnormalities [1]. Two major left-hand wrist radiograph-based methods for bone age estimation are current- ly used: the Greulich-Pyle [2] and Tanner- Whitehouse [3] methods. The former is much more frequently used in clinical practice. Greulich-Pyle–based bone age estimation is performed by comparing a patient’s left-hand radiograph to standard radiographs in the Greulich-Pyle atlas and is therefore simple and easily applied in clinical practice. How- ever, the process of bone age estimation, which comprises a simple comparison of multiple images, can be repetitive and time consuming and is thus sometimes burden- some to radiologists. Moreover, the accuracy depends on the radiologist’s experience and tends to be subjective. Keywords: bone age, children, deep learning, neural network model DOI:10.2214/AJR.17.18224 J. R. Kim and W. H. Shim contributed equally to this work. Received March 12, 2017; accepted after revision July 7, 2017. S. Kim is employed by Vuno, Inc., which created the deep learning–based automatic software system for bone age determination. J. R. Kim, W. H. Shim, H. M. Yoon, S. H. Hong, J. S. Lee, and Y. A. Cho are employed by Asan Medical Center, which holds patent rights for the deep learning–based automatic software system for bone age assessment. OBJECTIVE. The purpose of this study is to evaluate the accuracy and efficiency of a new automatic software system for bone age assessment and to validate its feasibility in clini- cal practice. MATERIALS AND METHODS. A Greulich-Pyle method–based deep-learning tech- nique was used to develop the automatic software system for bone age determination. Using this software, bone age was estimated from left-hand radiographs of 200 patients (3–17 years old) using first-rank bone age (software only), computer-assisted bone age (two radiologists with software assistance), and Greulich-Pyle atlas–assisted bone age (two radiologists with Greulich-Pyle atlas assistance only). The reference bone age was determined by the consen- sus of two experienced radiologists. RESULTS. First-rank bone ages determined by the automatic software system showed a 69.5% concordance rate and significant correlations with the reference bone age (r = 0.992; p < 0.001). Concordance rates increased with the use of the automatic software system for both reviewer 1 (63.0% for Greulich-Pyle atlas–assisted bone age vs 72.5% for computer-as- sisted bone age) and reviewer 2 (49.5% for Greulich-Pyle atlas–assisted bone age vs 57.5% for computer-assisted bone age). Reading times were reduced by 18.0% and 40.0% for reviewers 1 and 2, respectively. CONCLUSION. Automatic software system showed reliably accurate bone age estima- tions and appeared to enhance efficiency by reducing reading times without compromising the diagnostic accuracy. Kim et al. Accuracy and Efficiency of Computerized Bone Age Estimation Pediatric Imaging Original Research Downloadedfromwww.ajronline.orgbyFloridaAtlanticUnivon09/13/17fromIPaddress131.91.169.193.CopyrightARRS.Forpersonaluseonly;allrightsreserved • 총 환자의 수: 200명 • 레퍼런스: 경험 많은 소아영상의학과 전문의 2명(18년, 4년 경력)의 컨센서스 • 의사A: 소아영상 세부전공한 영상의학 전문의 (500례 이상의 판독 경험) • 의사B: 영상의학과 2년차 전공의 (판독법 하루 교육 이수 + 20례 판독) • 인공지능: VUNO의 골연령 판독 딥러닝 AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380.
  • 125.
    40 50 60 70 80 인공지능 의사 A의사 B 69.5% 63% 49.5% 정확도(%) 영상의학과 펠로우 (소아영상 세부전공) 영상의학과 2년차 전공의 인공지능 vs 의사 AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380. • 총 환자의 수: 200명 • 의사A: 소아영상 세부전공한 영상의학 전문의 (500례 이상의 판독 경험) • 의사B: 영상의학과 2년차 전공의 (판독법 하루 교육 이수 + 20례 판독) • 레퍼런스: 경험 많은 소아영상의학과 전문의 2명(18년, 4년 경력)의 컨센서스 • 인공지능: VUNO의 골연령 판독 딥러닝 골연령 판독에 인간 의사와 인공지능의 시너지 효과 Digital Healthcare Institute Director,Yoon Sup Choi, PhD yoonsup.choi@gmail.com
  • 126.
    40 50 60 70 80 인공지능 의사 A의사 B 40 50 60 70 80 의사 A 
 + 인공지능 의사 B 
 + 인공지능 69.5% 63% 49.5% 72.5% 57.5% 정확도(%) 영상의학과 펠로우 (소아영상 세부전공) 영상의학과 2년차 전공의 인공지능 vs 의사 인공지능 + 의사 AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380. • 총 환자의 수: 200명 • 의사A: 소아영상 세부전공한 영상의학 전문의 (500례 이상의 판독 경험) • 의사B: 영상의학과 2년차 전공의 (판독법 하루 교육 이수 + 20례 판독) • 레퍼런스: 경험 많은 소아영상의학과 전문의 2명(18년, 4년 경력)의 컨센서스 • 인공지능: VUNO의 골연령 판독 딥러닝 골연령 판독에 인간 의사와 인공지능의 시너지 효과 Digital Healthcare Institute Director,Yoon Sup Choi, PhD yoonsup.choi@gmail.com
  • 127.
    총 판독 시간(m) 0 50 100 150 200 w/o AI w/ AI 0 50 100 150 200 w/o AI w/ AI 188m 154m 180m 108m saving 40% of time saving 18% of time 의사 A 의사 B 골연령 판독에서 인공지능을 활용하면 판독 시간의 절감도 가능 • 총 환자의 수: 200명 • 의사A: 소아영상 세부전공한 영상의학 전문의 (500례 이상의 판독 경험) • 의사B: 영상의학과 2년차 전공의 (판독법 하루 교육 이수 + 20례 판독) • 레퍼런스: 경험 많은 소아영상의학과 전문의 2명(18년, 4년 경력)의 컨센서스 • 인공지능: VUNO의 골연령 판독 딥러닝 AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380. Digital Healthcare Institute Director,Yoon Sup Choi, PhD yoonsup.choi@gmail.com
  • 128.
    This copy isfor personal use only. To order printed copies, contact reprints@rsna.org This copy is for personal use only. To order printed copies, contact reprints@rsna.org ORIGINAL RESEARCH • THORACIC IMAGING hest radiography, one of the most common diagnos- intraobserver agreements because of its limited spatial reso- Development and Validation of Deep Learning–based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs Ju Gang Nam, MD* • Sunggyun Park, PhD* • Eui Jin Hwang, MD • Jong Hyuk Lee, MD • Kwang-Nam Jin, MD, PhD • KunYoung Lim, MD, PhD • Thienkai HuyVu, MD, PhD • Jae Ho Sohn, MD • Sangheum Hwang, PhD • Jin Mo Goo, MD, PhD • Chang Min Park, MD, PhD From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital, Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial & Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea (S.H.). Received January 30, 2018; revision requested March 20; revision received July 29; accepted August 6. Address correspondence to C.M.P. (e-mail: cmpark.morphius@gmail.com). Study supported by SNUH Research Fund and Lunit (06–2016–3000) and by Seoul Research and Business Development Program (FI170002). *J.G.N. and S.P. contributed equally to this work. Conflicts of interest are listed at the end of this article. Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237 • Content codes: Purpose: To develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules on chest radiographs and to compare its performance with physicians including thoracic radiologists. Materials and Methods: For this retrospective study, DLAD was developed by using 43292 chest radiographs (normal radiograph– to–nodule radiograph ratio, 34067:9225) in 34676 patients (healthy-to-nodule ratio, 30784:3892; 19230 men [mean age, 52.8 years; age range, 18–99 years]; 15446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015, which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph clas- sification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared. Results: According to one internal and four external validation data sets, radiograph classification and nodule detection perfor- mances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P , .05), and all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range, 0.006–0.190; P , .05). Conclusion: This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nod- ule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when used as a second reader. ©RSNA, 2018 Online supplemental material is available for this article.
  • 129.
    This copy isfor personal use only. To order printed copies, contact reprints@rsna.org This copy is for personal use only. To order printed copies, contact reprints@rsna.org ORIGINAL RESEARCH • THORACIC IMAGING hest radiography, one of the most common diagnos- intraobserver agreements because of its limited spatial reso- Development and Validation of Deep Learning–based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs Ju Gang Nam, MD* • Sunggyun Park, PhD* • Eui Jin Hwang, MD • Jong Hyuk Lee, MD • Kwang-Nam Jin, MD, PhD • KunYoung Lim, MD, PhD • Thienkai HuyVu, MD, PhD • Jae Ho Sohn, MD • Sangheum Hwang, PhD • Jin Mo Goo, MD, PhD • Chang Min Park, MD, PhD From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital, Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial & Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea (S.H.). Received January 30, 2018; revision requested March 20; revision received July 29; accepted August 6. Address correspondence to C.M.P. (e-mail: cmpark.morphius@gmail.com). Study supported by SNUH Research Fund and Lunit (06–2016–3000) and by Seoul Research and Business Development Program (FI170002). *J.G.N. and S.P. contributed equally to this work. Conflicts of interest are listed at the end of this article. Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237 • Content codes: Purpose: To develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules on chest radiographs and to compare its performance with physicians including thoracic radiologists. Materials and Methods: For this retrospective study, DLAD was developed by using 43292 chest radiographs (normal radiograph– to–nodule radiograph ratio, 34067:9225) in 34676 patients (healthy-to-nodule ratio, 30784:3892; 19230 men [mean age, 52.8 years; age range, 18–99 years]; 15446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015, which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph clas- sification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared. Results: According to one internal and four external validation data sets, radiograph classification and nodule detection perfor- mances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P , .05), and all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range, 0.006–0.190; P , .05). Conclusion: This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nod- ule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when used as a second reader. ©RSNA, 2018 Online supplemental material is available for this article. • 43,292 chest PA (normal:nodule=34,067:9225) • labeled/annotated by 13 board-certified radiologists. • DLAD were validated 1 internal + 4 external datasets • 서울대병원 / 보라매병원 / 국립암센터 / UCSF • Classification / Lesion localization • 인공지능 vs. 의사 vs. 인공지능+의사 • 다양한 수준의 의사와 비교 • non-radiology / radiology residents • board-certified radiologist / Thoracic radiologists
  • 130.
    This copy isfor personal use only. To order printed copies, contact reprints@rsna.org This copy is for personal use only. To order printed copies, contact reprints@rsna.org ORIGINAL RESEARCH • THORACIC IMAGING hest radiography, one of the most common diagnos- intraobserver agreements because of its limited spatial reso- Development and Validation of Deep Learning–based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs Ju Gang Nam, MD* • Sunggyun Park, PhD* • Eui Jin Hwang, MD • Jong Hyuk Lee, MD • Kwang-Nam Jin, MD, PhD • KunYoung Lim, MD, PhD • Thienkai HuyVu, MD, PhD • Jae Ho Sohn, MD • Sangheum Hwang, PhD • Jin Mo Goo, MD, PhD • Chang Min Park, MD, PhD From the Department of Radiology and Institute of Radiation Medicine, Seoul National University Hospital and College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul 03080, Republic of Korea (J.G.N., E.J.H., J.M.G., C.M.P.); Lunit Incorporated, Seoul, Republic of Korea (S.P.); Department of Radiology, Armed Forces Seoul Hospital, Seoul, Republic of Korea (J.H.L.); Department of Radiology, Seoul National University Boramae Medical Center, Seoul, Republic of Korea (K.N.J.); Department of Radiology, National Cancer Center, Goyang, Republic of Korea (K.Y.L.); Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, Calif (T.H.V., J.H.S.); and Department of Industrial & Information Systems Engineering, Seoul National University of Science and Technology, Seoul, Republic of Korea (S.H.). Received January 30, 2018; revision requested March 20; revision received July 29; accepted August 6. Address correspondence to C.M.P. (e-mail: cmpark.morphius@gmail.com). Study supported by SNUH Research Fund and Lunit (06–2016–3000) and by Seoul Research and Business Development Program (FI170002). *J.G.N. and S.P. contributed equally to this work. Conflicts of interest are listed at the end of this article. Radiology 2018; 00:1–11 • https://doi.org/10.1148/radiol.2018180237 • Content codes: Purpose: To develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules on chest radiographs and to compare its performance with physicians including thoracic radiologists. Materials and Methods: For this retrospective study, DLAD was developed by using 43292 chest radiographs (normal radiograph– to–nodule radiograph ratio, 34067:9225) in 34676 patients (healthy-to-nodule ratio, 30784:3892; 19230 men [mean age, 52.8 years; age range, 18–99 years]; 15446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015, which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph clas- sification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared. Results: According to one internal and four external validation data sets, radiograph classification and nodule detection perfor- mances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P , .05), and all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range, 0.006–0.190; P , .05). Conclusion: This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nod- ule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when used as a second reader. ©RSNA, 2018 Online supplemental material is available for this article. • 43,292 chest PA (normal:nodule=34,067:9225) • labeled/annotated by 13 board-certified radiologists. • DLAD were validated 1 internal + 4 external datasets • 서울대병원 / 보라매병원 / 국립암센터 / UCSF • Classification / Lesion localization • 인공지능 vs. 의사 vs. 인공지능+의사 • 다양한 수준의 의사와 비교 • Non-radiology / radiology residents • Board-certified radiologist / Thoracic radiologists
  • 131.
    Nam et al Figure1: Images in a 78-year-old female patient with a 1.9-cm part-solid nodule at the left upper lobe. (a) The nodule was faintly visible on the chest radiograph (arrowheads) and was detected by 11 of 18 observers. (b) At contrast-enhanced CT examination, biopsy confirmed lung adeno- carcinoma (arrow). (c) DLAD reported the nodule with a confidence level of 2, resulting in its detection by an additional five radiologists and an elevation in its confidence by eight radiologists. Figure 2: Images in a 64-year-old male patient with a 2.2-cm lung adenocarcinoma at the left upper lobe. (a) The nodule was faintly visible on the chest radiograph (arrowheads) and was detected by seven of 18 observers. (b) Biopsy confirmed lung adenocarcinoma in the left upper lobe on contrast-enhanced CT image (arrow). (c) DLAD reported the nodule with a confidence level of 2, resulting in its detection by an additional two radiologists and an elevated confidence level of the nodule by two radiologists.
  • 132.
    Deep Learning AutomaticDetection Algorithm for Malignant Pulmonary Nodules Table 3: Patient Classification and Nodule Detection at the Observer Performance Test Observer Test 1 DLAD versus Test 1 (P Value) Test 2 Test 1 versus Test 2 (P Value) Radiograph Classification (AUROC) Nodule Detection (JAFROC FOM) Radiograph Classification Nodule Detection Radiograph Classification (AUROC) Nodule Detection (JAFROC FOM) Radiograph Classification Nodule Detection Nonradiology physicians Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001 Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001 Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001 Group 0.691 ,.001* 0.828 ,.001* Radiology residents Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03 Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001 Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54 Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02 Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001 Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03 Group 0.790 ,.001* 0.867 ,.001* Board-certified radiologists Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002 Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04 Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01 Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24 Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23 Group 0.821 .02* 0.840 .01* Thoracic radiologists Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03 Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02 Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12 Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02 Group 0.833 .08* 0.854 ,.001* Note.—Observer 4 had 1 year of experience; observers 5 and 6 had 2 years of experience; observers 7–9 had 3 years of experience; observers 10–12 had 7 years of experience; observers 13 and 14 had 8 years of experience; observer 15 had 26 years of experience; observer 16 had 13 years of experience; and observers 17 and 18 had 9 years of experience. Observers 1–3 were 4th-year residents from obstetrics and gynecolo- 의사 인공지능 vs. 의사만 (p value) 의사+인공지능 의사 vs. 의사+인공지능 (p value) 영상의학과 1년차 전공의 영상의학과 2년차 전공의 영상의학과 3년차 전공의 산부인과 4년차 전공의 정형외과 4년차 전공의 내과 4년차 전공의 영상의학과 전문의 7년 경력 8년 경력 영상의학과 전문의 (흉부) 26년 경력 13년 경력 9년 경력 영상의학과 전공의 비영상의학과 의사
  • 133.
    Deep Learning AutomaticDetection Algorithm for Malignant Pulmonary Nodules Table 3: Patient Classification and Nodule Detection at the Observer Performance Test Observer Test 1 DLAD versus Test 1 (P Value) Test 2 Test 1 versus Test 2 (P Value) Radiograph Classification (AUROC) Nodule Detection (JAFROC FOM) Radiograph Classification Nodule Detection Radiograph Classification (AUROC) Nodule Detection (JAFROC FOM) Radiograph Classification Nodule Detection Nonradiology physicians Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001 Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001 Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001 Group 0.691 ,.001* 0.828 ,.001* Radiology residents Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03 Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001 Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54 Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02 Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001 Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03 Group 0.790 ,.001* 0.867 ,.001* Board-certified radiologists Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002 Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04 Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01 Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24 Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23 Group 0.821 .02* 0.840 .01* Thoracic radiologists Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03 Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02 Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12 Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02 Group 0.833 .08* 0.854 ,.001* Note.—Observer 4 had 1 year of experience; observers 5 and 6 had 2 years of experience; observers 7–9 had 3 years of experience; observers 10–12 had 7 years of experience; observers 13 and 14 had 8 years of experience; observer 15 had 26 years of experience; observer 16 had 13 years of experience; and observers 17 and 18 had 9 years of experience. Observers 1–3 were 4th-year residents from obstetrics and gynecolo- 의사 인공지능 vs. 의사만 (p value) 의사+인공지능 의사 vs. 의사+인공지능 (p value) 영상의학과 1년차 전공의 영상의학과 2년차 전공의 영상의학과 3년차 전공의 산부인과 4년차 전공의 정형외과 4년차 전공의 내과 4년차 전공의 영상의학과 전문의 7년 경력 8년 경력 영상의학과 전문의 (흉부) 26년 경력 13년 경력 9년 경력 영상의학과 전공의 비영상의학과 의사 •인공지능을 second reader로 활용하면 정확도가 개선 •classification: 17 of 18 명이 개선 (15 of 18, P<0.05) •nodule detection: 18 of 18 명이 개선 (14 of 18, P<0.05)
  • 134.
  • 135.
    Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018 Downloadedfromhttps://journals.lww.com/ajspbyBhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1AWnYQp/IlQrHD3MyLIZIvnCFZVJ56DGsD590P5lh5KqE20T/dBX3x9CoM=on10/14/2018 Impact of DeepLearning Assistance on the Histopathologic Review of Lymph Nodes for Metastatic Breast Cancer David F. Steiner, MD, PhD,* Robert MacDonald, PhD,* Yun Liu, PhD,* Peter Truszkowski, MD,* Jason D. Hipp, MD, PhD, FCAP,* Christopher Gammage, MS,* Florence Thng, MS,† Lily Peng, MD, PhD,* and Martin C. Stumpe, PhD* Abstract: Advances in the quality of whole-slide images have set the stage for the clinical use of digital images in anatomic pathology. Along with advances in computer image analysis, this raises the possibility for computer-assisted diagnostics in pathology to improve histopathologic interpretation and clinical care. To evaluate the potential impact of digital assistance on interpretation of digitized slides, we conducted a multireader multicase study utilizing our deep learning algorithm for the detection of breast cancer metastasis in lymph nodes. Six pathologists reviewed 70 digitized slides from lymph node sections in 2 reader modes, unassisted and assisted, with a wash- out period between sessions. In the assisted mode, the deep learning algorithm was used to identify and outline regions with high like- lihood of containing tumor. Algorithm-assisted pathologists demon- strated higher accuracy than either the algorithm or the pathologist alone. In particular, algorithm assistance significantly increased the sensitivity of detection for micrometastases (91% vs. 83%, P=0.02). In addition, average review time per image was significantly shorter with assistance than without assistance for both micrometastases (61 vs. 116 s, P=0.002) and negative images (111 vs. 137 s, P=0.018). Lastly, pathologists were asked to provide a numeric score regarding the difficulty of each image classification. On the basis of this score, pathologists considered the image review of micrometastases to be significantly easier when interpreted with assistance (P=0.0005). Utilizing a proof of concept assistant tool, this study demonstrates the potential of a deep learning algorithm to improve pathologist accu- racy and efficiency in a digital pathology workflow. Key Words: artificial intelligence, machine learning, digital pathology, breast cancer, computer aided detection (Am J Surg Pathol 2018;00:000–000) The regulatory approval and gradual implementation of whole-slide scanners has enabled the digitization of glass slides for remote consults and archival purposes.1 Digitiza- tion alone, however, does not necessarily improve the con- sistency or efficiency of a pathologist’s primary workflow. In fact, image review on a digital medium can be slightly slower than on glass, especially for pathologists with limited digital pathology experience.2 However, digital pathology and image analysis tools have already demonstrated po- tential benefits, including the potential to reduce inter-reader variability in the evaluation of breast cancer HER2 status.3,4 Digitization also opens the door for assistive tools based on Artificial Intelligence (AI) to improve efficiency and con- sistency, decrease fatigue, and increase accuracy.5 Among AI technologies, deep learning has demon- strated strong performance in many automated image-rec- ognition applications.6–8 Recently, several deep learning– based algorithms have been developed for the detection of breast cancer metastases in lymph nodes as well as for other applications in pathology.9,10 Initial findings suggest that some algorithms can even exceed a pathologist’s sensitivity for detecting individual cancer foci in digital images. How- ever, this sensitivity gain comes at the cost of increased false positives, potentially limiting the utility of such algorithms for automated clinical use.11 In addition, deep learning algo- rithms are inherently limited to the task for which they have been specifically trained. While we have begun to understand the strengths of these algorithms (such as exhaustive search) and their weaknesses (sensitivity to poor optical focus, tumor mimics; manuscript under review), the potential clinical util- ity of such algorithms has not been thoroughly examined. While an accurate algorithm alone will not necessarily aid pathologists or improve clinical interpretation, these benefits may be achieved through thoughtful and appropriate in- tegration of algorithm predictions into the clinical workflow.8 From the *Google AI Healthcare; and †Verily Life Sciences, Mountain View, CA. D.F.S., R.M., and Y.L. are co-first authors (equal contribution). Work done as part of the Google Brain Healthcare Technology Fellowship (D.F.S. and P.T.). Conflicts of Interest and Source of Funding: D.F.S., R.M., Y.L., P.T., J.D.H., C.G., F.T., L.P., M.C.S. are employees of Alphabet and have Alphabet stock. Correspondence: David F. Steiner, MD, PhD, Google AI Healthcare, 1600 Amphitheatre Way, Mountain View, CA 94043 (e-mail: davesteiner@google.com). Supplemental Digital Content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s website, www.ajsp.com. Copyright © 2018 The Author(s). Published by Wolters Kluwer Health, Inc. This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal. ORIGINAL ARTICLE Am J Surg Pathol Volume 00, Number 00, ’’ 2018 www.ajsp.com | 1 • 구글이 개발한 병리 인공지능, LYNA(LYmph Node Assistant) • 유방암의 림프절 전이에 대해서, • 병리학 전문의 + 인공지능의 시너지를 증명하는 연구 • 정확성(민감도) / 판독 시간 / (micrometa의) 판독 난이도
  • 136.
    modeled separately. Formicrometastases, sensitivity was significantly higher with Negative (Specificity) Micromet (Sensitivity) Macromet (Sensitivity) 0.7 0.5 0.6 0.8 0.9 1.0 p=0.02 A B Performance Unassisted Assisted FIGURE 3. Improved metastasis detection with algorithm assistance. A, Data represents performance across all images by image category and assistance modality. Error bars indicate SE. The performance metric corresponds to corresponds to specificity for negative cases and sensitivity for micrometastases (micromet) and macrometastases (macromet). B, Operating point of individual pathologists with and without assistance for micrometastases and negative cases, overlayed on the receiver operating characteristic curve of the algorithm. AUC indicates area under the curve. Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved. www.ajsp.com | 5 • 민감도 • 인공지능을 사용한 경우 Micromet의 경우에 유의미하게 상승 • Negative와 Macromet은 유의미하지 않음 • AUC • 병리학 전문의 혼자 or 인공지능 혼자보다, • 병리학 전문의+인공지능이 조금 더 높음
  • 137.
    isolated diagnostic tasks.Underlying these exciting advances, however, is the important notion that these algorithms do not replace the breadth and contextual knowledge of human pathologists and that even the best algorithms would need to from 83% to 91% and resulted in higher overall diagnostic accuracy than that of either unassisted pathologist inter- pretation or the computer algorithm alone. Although deep learning algorithms have been credited with comparable Unassisted Assisted TimeofReview(seconds) Timeofreviewperimage(seconds) Negative ITC Micromet Macromet A B p=0.002 p=0.02 Unassisted Assisted Micrometastases FIGURE 5. Average review time per image decreases with assistance. A, Average review time per image across all pathologists analyzed by category. Black circles are average times with assistance, gray triangles represent average times without assistance. Error bars indicate 95% confidence interval. B, Micrometastasis time of review decreases for nearly all images with assistance. Circles represent average review time for each individual micrometastasis image, averaged across the 6 pathologists by assistance modality. The dashed lines connect the points corresponding to the same image with and without assistance. The 2 images that were not reviewed faster on average with assistance are represented with red dot-dash lines. Vertical lines of the box represent quartiles, and the diamond indicates the average of review time for micrometastases in that modality. Micromet indicates mi- crometastasis; macromet, macrometastasis. 8 | www.ajsp.com Copyright © 2018 Wolters Kluwer Health, Inc. All rights reserved. • 판독 시간 (per image) • 인공지능의 보조를 받으면, Negative와 Micromet 은 유의미하게 감소 • 특히, Micromet은 약 2분에서 1분으로 절반 가량 감소 • ITC(Isolated Tumor Cell)와 Negative는 유의미하지 않음
  • 138.
    • 주관적인 판독난이도 • Micromet 에 대해서만 유의미하게 쉬워짐 • 다른 경우는 통계적으로 유의미하지 않았음 unassisted increase in ut any sig- e 3). eep learning ologists for integrate into existing clinical workflows in order to improve patient care. In this proof-of-concept study, we investigated the impact of a computer assistance tool for the interpretation of digitized HE slides, and show that a digital tool developed to assist with the identification of lymph node metastases can indeed augment the efficiency and accuracy of pathologists. In regards to accuracy, algorithm assistance im- proved the sensitivity of detection of micrometastases ry and I) (s) d P 21) 0.018 45) 0.21 ) 0.002 ) 0.46 TABLE 3. Average Obviousness Scores to Assess the Difficulty of Each Case by Image Category and Assistance Modality Average Obviousness Score (95% CI) Category (n images) Unassisted Assisted P Negative (24) 67.5 (63.6-71.3) 72.0 (68.7-75.3) 0.29 Isolated tumor cells (8) 55.6 (47.7-63.5) 50.4 (42.2-58.6) 0.47 Micrometastasis (19) 63.1 (58.3-67.9) 83.6 (80.3-86.9) 0.0005 Macrometastasis (19) 90.1 (86.4-93.7) 93.1 (90.0-96.1) 0.16 Bold values indicates statistically significant differences. Am J Surg Pathol Volume 00, Number 00, ’’ 2018
  • 139.
    인공지능, 어떻게 활용할것인가 (2) 워크 플로우에 녹아들기
  • 140.
  • 141.
    Access to PathologyAI algorithms is limited Adoption barriers for digital pathology • Expensive scanners • IT infrastructure required • Disrupt existing workflows • Not all clinical needs addressed (speed, focus, etc)
  • 143.
        Figures        Figure 1: System overview.   1: Schematic sketch of the whole device.  2: A photo of the actual implementation.  An Augmented RealityMicroscope for Realtime Automated Detection of Cancer https://research.googleblog.com/2018/04/an-augmented-reality-microscope.html
  • 144.
    An Augmented RealityMicroscope for Cancer Detection https://www.youtube.com/watch?v=9Mz84cwVmS0
  • 145.
  • 146.
    An Augmented RealityMicroscope for Realtime Automated Detection of Cancer     PR quantification Mitosis Counting on HE slide Measurement of tumor size Identification of H. pylori Identification of Mycobacterium Identification of prostate cancer region with estimation of percentage tumor involvement Ki67 quantification P53 quantification CD8 quantification https://research.googleblog.com/2018/04/an-augmented-reality-microscope.html
  • 147.
    인공지능, 어떻게 활용할것인가 (3) 새로운 의학 연구의 출발점
  • 148.
    •“향후 10년 동안첫번째 cardiovascular event 가 올 것인가” 예측 •전향적 코호트 스터디: 영국 환자 378,256 명 •일상적 의료 데이터를 바탕으로 기계학습으로 질병을 예측하는 첫번째 대규모 스터디 •기존의 ACC/AHA 가이드라인과 4가지 기계학습 알고리즘의 정확도를 비교 •Random forest; Logistic regression; Gradient bossting; Neural network
  • 149.
    Can machine-learning improvecardiovascular risk prediction using routine clinical data? Stephen F.Weng et al PLoS One 2017 in a sensitivity of 62.7% and PPV of 17.1%. The random forest algorithm resulted in a net increase of 191 CVD cases from the baseline model, increasing the sensitivity to 65.3% and PPV to 17.8% while logistic regression resulted in a net increase of 324 CVD cases (sensitivity 67.1%; PPV 18.3%). Gradient boosting machines and neural networks performed best, result- ing in a net increase of 354 (sensitivity 67.5%; PPV 18.4%) and 355 CVD (sensitivity 67.5%; PPV 18.4%) cases correctly predicted, respectively. The ACC/AHA baseline model correctly predicted 53,106 non-cases from 75,585 total non- cases, resulting in a specificity of 70.3% and NPV of 95.1%. The net increase in non-cases Table 3. Top 10 risk factor variables for CVD algorithms listed in descending order of coefficient effect size (ACC/AHA; logistic regression), weighting (neural networks), or selection frequency (random forest, gradient boosting machines). Algorithms were derived from training cohort of 295,267 patients. ACC/AHA Algorithm Machine-learning Algorithms Men Women ML: Logistic Regression ML: Random Forest ML: Gradient Boosting Machines ML: Neural Networks Age Age Ethnicity Age Age Atrial Fibrillation Total Cholesterol HDL Cholesterol Age Gender Gender Ethnicity HDL Cholesterol Total Cholesterol SES: Townsend Deprivation Index Ethnicity Ethnicity Oral Corticosteroid Prescribed Smoking Smoking Gender Smoking Smoking Age Age x Total Cholesterol Age x HDL Cholesterol Smoking HDL cholesterol HDL cholesterol Severe Mental Illness Treated Systolic Blood Pressure Age x Total Cholesterol Atrial Fibrillation HbA1c Triglycerides SES: Townsend Deprivation Index Age x Smoking Treated Systolic Blood Pressure Chronic Kidney Disease Triglycerides Total Cholesterol Chronic Kidney Disease Age x HDL Cholesterol Untreated Systolic Blood Pressure Rheumatoid Arthritis SES: Townsend Deprivation Index HbA1c BMI missing Untreated Systolic Blood Pressure Age x Smoking Family history of premature CHD BMI Systolic Blood Pressure Smoking Diabetes Diabetes COPD Total Cholesterol SES: Townsend Deprivation Index Gender Italics: Protective Factors https://doi.org/10.1371/journal.pone.0174944.t003 PLOS ONE | https://doi.org/10.1371/journal.pone.0174944 April 4, 2017 8 / 14 •기존 ACC/AHA 가이드라인의 위험 요소의 일부분만 기계학습 알고리즘에도 포함 •하지만, Diabetes는 네 모델 모두에서 포함되지 않았다. •기존의 위험 예측 툴에는 포함되지 않던, 아래와 같은 새로운 요소들이 포함되었다. •COPD, severe mental illness, prescribing of oral corticosteroids •triglyceride level 등의 바이오 마커
  • 150.
    Can machine-learning improvecardiovascular risk prediction using routine clinical data? Stephen F.Weng et al PLoS One 2017 correctly predicted compared to the baseline ACC/AHA model ranged from 191 non-cases for the random forest algorithm to 355 non-cases for the neural networks. Full details on classifi- cation analysis can be found in S2 Table. Discussion Compared to an established AHA/ACC risk prediction algorithm, we found all machine- learning algorithms tested were better at identifying individuals who will develop CVD and those that will not. Unlike established approaches to risk prediction, the machine-learning methods used were not limited to a small set of risk factors, and incorporated more pre-exist- Table 4. Performance of the machine-learning (ML) algorithms predicting 10-year cardiovascular disease (CVD) risk derived from applying train- ing algorithms on the validation cohort of 82,989 patients. Higher c-statistics results in better algorithm discrimination. The baseline (BL) ACC/AHA 10-year risk prediction algorithm is provided for comparative purposes. Algorithms AUC c-statistic Standard Error* 95% Confidence Interval Absolute Change from Baseline LCL UCL BL: ACC/AHA 0.728 0.002 0.723 0.735 — ML: Random Forest 0.745 0.003 0.739 0.750 +1.7% ML: Logistic Regression 0.760 0.003 0.755 0.766 +3.2% ML: Gradient Boosting Machines 0.761 0.002 0.755 0.766 +3.3% ML: Neural Networks 0.764 0.002 0.759 0.769 +3.6% *Standard error estimated by jack-knife procedure [30] https://doi.org/10.1371/journal.pone.0174944.t004 Can machine-learning improve cardiovascular risk prediction using routine clinical data? •네 가지 기계학습 모델 모두 기존의 ACC/AHA 가이드라인 대비 더 정확했다. •Neural Networks 이 AUC=0.764 로 가장 정확했다. •“이 모델을 활용했더라면 355 명의 추가적인 cardiovascular event 를 예방했을 것” •Deep Learning 을 활용하면 정확도는 더 높아질 수 있을 것 •Genetic information 등의 추가적인 risk factor 를 활용해볼 수 있다.
  • 151.
    Constructing higher-level contextual/relational features: Relationshipsbetween epithelial nuclear neighbors Relationships between morphologically regular and irregular nuclei Relationships between epithelial and stromal objects Relationships between epithelial nuclei and cytoplasm Characteristics of stromal nuclei and stromal matrix Characteristics of epithelial nuclei and epithelial cytoplasm Building an epithelial/stromal classifier: Epithelial vs.stroma classifier Epithelial vs.stroma classifier B Basic image processing and feature construction: HE image Image broken into superpixels Nuclei identified within each superpixel A Relationships of contiguous epithelial regions with underlying nuclear objects Learning an image-based model to predict survival Processed images from patients Processed images from patients C D onNovember17,2011stm.sciencemag.orgwnloadedfrom TMAs contain 0.6-mm-diameter cores (median of two cores per case) that represent only a small sample of the full tumor. We acquired data from two separate and independent cohorts: Nether- lands Cancer Institute (NKI; 248 patients) and Vancouver General Hospital (VGH; 328 patients). Unlike previous work in cancer morphom- etry (18–21), our image analysis pipeline was not limited to a predefined set of morphometric features selected by pathologists. Rather, C-Path measures an extensive, quantitative feature set from the breast cancer epithelium and the stro- ma (Fig. 1). Our image processing system first performed an automated, hierarchical scene seg- mentation that generated thousands of measure- ments, including both standard morphometric descriptors of image objects and higher-level contextual, relational, and global image features. The pipeline consisted of three stages (Fig. 1, A to C, and tables S8 and S9). First, we used a set of processing steps to separate the tissue from the background, partition the image into small regions of coherent appearance known as superpixels, find nuclei within the superpixels, and construct Constructing higher-level contextual/relational features: Relationships between epithelial nuclear neighbors Relationships between morphologically regular and irregular nuclei Relationships between epithelial and stromal objects Relationships between epithelial nuclei and cytoplasm Characteristics of stromal nuclei and stromal matrix Characteristics of epithelial nuclei and epithelial cytoplasm Epithelial vs.stroma classifier Epithelial vs.stroma classifier Relationships of contiguous epithelial regions with underlying nuclear objects Learning an image-based model to predict survival Processed images from patients alive at 5 years Processed images from patients deceased at 5 years L1-regularized logisticregression modelbuilding 5YS predictive model Unlabeled images Time P(survival) C D Identification of novel prognostically important morphologic features basic cellular morphologic properties (epithelial reg- ular nuclei = red; epithelial atypical nuclei = pale blue; epithelial cytoplasm = purple; stromal matrix = green; stromal round nuclei = dark green; stromal spindled nuclei = teal blue; unclassified regions = dark gray; spindled nuclei in unclassified regions = yellow; round nuclei in unclassified regions = gray; background = white). (Left panel) After the classification of each image object, a rich feature set is constructed. (D) Learning an image-based model to predict survival. Processed images from patients alive at 5 years after surgery and from patients deceased at 5 years after surgery were used to construct an image-based prog- nostic model. After construction of the model, it was applied to a test set of breast cancer images (not used in model building) to classify patients as high or low risk of death by 5 years. www.ScienceTranslationalMedicine.org 9 November 2011 Vol 3 Issue 108 108ra113 2 onNovember17,2011stm.sciencemag.orgDownloadedfrom Digital Pathologist Sci Transl Med. 2011 Nov 9;3(108):108ra113
  • 152.
    Digital Pathologist Sci TranslMed. 2011 Nov 9;3(108):108ra113 Top stromal features associated with survival. primarily characterizing epithelial nuclear characteristics, such as size, color, and texture (21, 36). In contrast, after initial filtering of im- ages to ensure high-quality TMA images and training of the C-Path models using expert-derived image annotations (epithelium and stroma labels to build the epithelial-stromal classifier and survival time and survival status to build the prognostic model), our image analysis system is automated with no manual steps, which greatly in- creases its scalability. Additionally, in contrast to previous approaches, our system measures thousands of morphologic descriptors of diverse identification of prognostic features whose significance was not pre- viously recognized. Using our system, we built an image-based prognostic model on the NKI data set and showed that in this patient cohort the model was a strong predictor of survival and provided significant additional prognostic information to clinical, molecular, and pathological prog- nostic factors in a multivariate model. We also demonstrated that the image-based prognostic model, built using the NKI data set, is a strong prognostic factor on another, independent data set with very different SD of the ratio of the pixel intensity SD to the mean intensity for pixels within a ring of the center of epithelial nuclei A The sum of the number of unclassified objects SD of the maximum blue pixel value for atypical epithelial nuclei Maximum distance between atypical epithelial nuclei B C D Maximum value of the minimum green pixel intensity value in epithelial contiguous regions Minimum elliptic fit of epithelial contiguous regions SD of distance between epithelial cytoplasmic and nuclear objects Average border between epithelial cytoplasmic objects E F G H Fig. 5. Top epithelial features. The eight panels in the figure (A to H) each shows one of the top-ranking epithelial features from the bootstrap anal- ysis. Left panels, improved prognosis; right panels, worse prognosis. (A) SD of the (SD of intensity/mean intensity) for pixels within a ring of the center of epithelial nuclei. Left, relatively consistent nuclear intensity pattern (low score); right, great nuclear intensity diversity (high score). (B) Sum of the number of unclassified objects. Red, epithelial regions; green, stromal re- gions; no overlaid color, unclassified region. Left, few unclassified objects (low score); right, higher number of unclassified objects (high score). (C) SD of the maximum blue pixel value for atypical epithelial nuclei. Left, high score; right, low score. (D) Maximum distance between atypical epithe- lial nuclei. Left, high score; right, low score. (Insets) Red, atypical epithelial nuclei; black, typical epithelial nuclei. (E) Minimum elliptic fit of epithelial contiguous regions. Left, high score; right, low score. (F) SD of distance between epithelial cytoplasmic and nuclear objects. Left, high score; right, low score. (G) Average border between epithelial cytoplasmic objects. Left, high score; right, low score. (H) Maximum value of the minimum green pixel intensity value in epithelial contiguous regions. Left, low score indi- cating black pixels within epithelial region; right, higher score indicating presence of epithelial regions lacking black pixels. onNovember17,2011stm.sciencemag.orgDownloadedfrom and stromal matrix throughout the image, with thin cords of epithe- lial cells infiltrating through stroma across the image, so that each stromal matrix region borders a relatively constant proportion of ep- ithelial and stromal regions. The stromal feature with the second largest coefficient (Fig. 4B) was the sum of the minimum green in- tensity value of stromal-contiguous regions. This feature received a value of zero when stromal regions contained dark pixels (such as inflammatory nuclei). The feature received a positive value when stromal objects were devoid of dark pixels. This feature provided in- formation about the relationship between stromal cellular composi- tion and prognosis and suggested that the presence of inflammatory cells in the stroma is associated with poor prognosis, a finding con- sistent with previous observations (32). The third most significant stromal feature (Fig. 4C) was a measure of the relative border between spindled stromal nuclei to round stromal nuclei, with an increased rel- ative border of spindled stromal nuclei to round stromal nuclei asso- ciated with worse overall survival. Although the biological underpinning of this morphologic feature is currently not known, this analysis sug- gested that spatial relationships between different populations of stro- mal cell types are associated with breast cancer progression. Reproducibility of C-Path 5YS model predictions on samples with multiple TMA cores For the C-Path 5YS model (which was trained on the full NKI data set), we assessed the intrapatient agreement of model predictions when predictions were made separately on each image contributed by pa- tients in the VGH data set. For the 190 VGH patients who contributed two images with complete image data, the binary predictions (high or low risk) on the individual images agreed with each other for 69% (131 of 190) of the cases and agreed with the prediction on the aver- aged data for 84% (319 of 380) of the images. Using the continuous prediction score (which ranged from 0 to 100), the median of the ab- solute difference in prediction score among the patients with replicate images was 5%, and the Spearman correlation among replicates was 0.27 (P = 0.0002) (fig. S3). This degree of intrapatient agreement is only moderate, and these findings suggest significant intrapatient tumor heterogeneity, which is a cardinal feature of breast carcinomas (33–35). Qualitative visual inspection of images receiving discordant scores suggested that intrapatient variability in both the epithelial and the stromal components is likely to contribute to discordant scores for the individual images. These differences appeared to relate both to the proportions of the epithelium and stroma and to the appearance of the epithelium and stroma. Last, we sought to analyze whether sur- vival predictions were more accurate on the VGH cases that contributed multiple cores compared to the cases that contributed only a single core. This analysis showed that the C-Path 5YS model showed signif- icantly improved prognostic prediction accuracy on the VGH cases for which we had multiple images compared to the cases that con- tributed only a single image (Fig. 7). Together, these findings show a significant degree of intrapatient variability and indicate that increased tumor sampling is associated with improved model performance. DISCUSSION Heat map of stromal matrix objects mean abs.diff to neighbors HE image separated into epithelial and stromal objects A B C Worse prognosis Improved prognosis Improved prognosis Improved prognosis Worse prognosis Worse prognosis Fig. 4. Top stromal features associated with survival. (A) Variability in ab- solute difference in intensity between stromal matrix regions and neigh- bors. Top panel, high score (24.1); bottom panel, low score (10.5). (Insets) Top panel, high score; bottom panel; low score. Right panels, stromal matrix objects colored blue (low), green (medium), or white (high) according to each object’s absolute difference in intensity to neighbors. (B) Presence R E S E A R C H A R T I C L E onNovember17,2011stm.sciencemag.orgDownloadedfrom Top epithelial features.The eight panels in the figure (A to H) each shows one of the top-ranking epithelial features from the bootstrap anal- ysis. Left panels, improved prognosis; right panels, worse prognosis. 유방암 예후 예측 위한 새로운 기준 발견
  • 153.
    P R EC I S I O N M E D I C I N E Identification of type 2 diabetes subgroups through topological analysis of patient similarity Li Li,1 Wei-Yi Cheng,1 Benjamin S. Glicksberg,1 Omri Gottesman,2 Ronald Tamler,3 Rong Chen,1 Erwin P. Bottinger,2 Joel T. Dudley1,4 * Type 2 diabetes (T2D) is a heterogeneous complex disease affecting more than 29 million Americans alone with a rising prevalence trending toward steady increases in the coming decades. Thus, there is a pressing clinical need to improve early prevention and clinical management of T2D and its complications. Clinicians have understood that patients who carry the T2D diagnosis have a variety of phenotypes and susceptibilities to diabetes-related compli- cations. We used a precision medicine approach to characterize the complexity of T2D patient populations based on high-dimensional electronic medical records (EMRs) and genotype data from 11,210 individuals. We successfully identified three distinct subgroups of T2D from topology-based patient-patient networks. Subtype 1 was character- ized by T2D complications diabetic nephropathy and diabetic retinopathy; subtype 2 was enriched for cancer ma- lignancy and cardiovascular diseases; and subtype 3 was associated most strongly with cardiovascular diseases, neurological diseases, allergies, and HIV infections. We performed a genetic association analysis of the emergent T2D subtypes to identify subtype-specific genetic markers and identified 1279, 1227, and 1338 single-nucleotide polymorphisms (SNPs) that mapped to 425, 322, and 437 unique genes specific to subtypes 1, 2, and 3, respec- tively. By assessing the human disease–SNP association for each subtype, the enriched phenotypes and biological functions at the gene level for each subtype matched with the disease comorbidities and clinical dif- ferences that we identified through EMRs. Our approach demonstrates the utility of applying the precision medicine paradigm in T2D and the promise of extending the approach to the study of other complex, multi- factorial diseases. INTRODUCTION Type 2 diabetes (T2D) is a complex, multifactorial disease that has emerged as an increasing prevalent worldwide health concern asso- ciated with high economic and physiological burdens. An estimated 29.1 million Americans (9.3% of the population) were estimated to have some form of diabetes in 2012—up 13% from 2010—with T2D representing up to 95% of all diagnosed cases (1, 2). Risk factors for T2D include obesity, family history of diabetes, physical inactivity, eth- nicity, and advanced age (1, 2). Diabetes and its complications now rank among the leading causes of death in the United States (2). In fact, diabetes is the leading cause of nontraumatic foot amputation, adult blindness, and need for kidney dialysis, and multiplies risk for myo- cardial infarction, peripheral artery disease, and cerebrovascular disease (3–6). The total estimated direct medical cost attributable to diabetes in the United States in 2012 was $176 billion, with an estimated $76 billion attributable to hospital inpatient care alone. There is a great need to im- prove understanding of T2D and its complex factors to facilitate pre- vention, early detection, and improvements in clinical management. A more precise characterization of T2D patient populations can en- hance our understanding of T2D pathophysiology (7, 8). Current clinical definitions classify diabetes into three major subtypes: type 1 dia- betes (T1D), T2D, and maturity-onset diabetes of the young. Other sub- types based on phenotype bridge the gap between T1D and T2D, for example, latent autoimmune diabetes in adults (LADA) (7) and ketosis- prone T2D. The current categories indicate that the traditional definition of diabetes, especially T2D, might comprise additional subtypes with dis- tinct clinical characteristics. A recent analysis of the longitudinal Whitehall II cohort study demonstrated improved assessment of cardiovascular risks when subgrouping T2D patients according to glucose concentration criteria (9). Genetic association studies reveal that the genetic architec- ture of T2D is profoundly complex (10–12). Identified T2D-associated risk variants exhibit allelic heterogeneity and directional differentiation among populations (13, 14). The apparent clinical and genetic com- plexity and heterogeneity of T2D patient populations suggest that there are opportunities to refine the current, predominantly symptom-based, definition of T2D into additional subtypes (7). Because etiological and pathophysiological differences exist among T2D patients, we hypothesize that a data-driven analysis of a clinical population could identify new T2D subtypes and factors. Here, we de- velop a data-driven, topology-based approach to (i) map the complexity of patient populations using clinical data from electronic medical re- cords (EMRs) and (ii) identify new, emergent T2D patient subgroups with subtype-specific clinical and genetic characteristics. We apply this approachtoadatasetcomprisingmatchedEMRsandgenotypedatafrom more than 11,000 individuals. Topological analysis of these data revealed three distinct T2D subtypes that exhibited distinct patterns of clinical characteristics and disease comorbidities. Further, we identified genetic markers associated with each T2D subtype and performed gene- and pathway-level analysis of subtype genetic associations. Biological and phenotypic features enriched in the genetic analysis corroborated clinical disparities observed among subgroups. Our findings suggest that data- driven,topologicalanalysisofpatientcohortshasutilityinprecisionmedicine effortstorefineourunderstandingofT2Dtowardimproving patient care. 1 Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 700 Lexington Ave., New York, NY 10065, USA. 2 Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA. 3 Division of Endocrinology, Diabetes, and Bone Diseases, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. 4 Department of Health Policy and Research, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA. *Corresponding author. E-mail: joel.dudley@mssm.edu R E S E A R C H A R T I C L E www.ScienceTranslationalMedicine.org 28 October 2015 Vol 7 Issue 311 311ra174 1 onOctober28,2015http://stm.sciencemag.org/Downloadedfrom and vision defects (RR, 1.32; range, 1.04 to 1.67), than were the other two subtypes (Table 2A). Patients in subtype 2 (n = 617) were more likely to associate with diseases of cancer of bronchus: lung (RR, 3.76; range, 1.14 to 12.39); malignant neoplasm with- out specification of site (RR, 3.46; range, 1.23 to 9.70); tuberculosis (RR, 2.93; range, 1.30 to 6.64); coronary atherosclerosis and other heart disease (RR, 1.28; range, 1.01 to 1.61); and other circulatory disease (RR, 1.27; range, 1.02 to 1.58), than were the other two subtypes (Table 2B). Patients in subtype 3 (n = 1096) were more often diagnosed with HIV infection (RR, 1.92; range, 1.30 to 2.85) and were associated with E codes (that is, external causes of injury care) (RR, 1.84; range, 1.41 to 2.39); aortic and peripheral arterial embolism or thrombosis (RR, 1.79; range,1.18to 2.71); hypertension withcom- plications and secondary hypertension (RR, 1.66; range, 1.29 to 2.15); coronary athero- sclerosis and other heart disease (RR, 1.41; range, 1.15 to 1.72); allergic reactions (RR, 1.42; range, 1.19 to 1.70); deficiency and other anemia (RR, 1.39; range, 1.14 to 1.68); and screening and history of mental health and substance abuse code (RR, 1.30; range, 1.07 to 1.58) (Table 2C). Significant disease–genetic variant enrichments specific to T2D subtypes We next evaluated the genetic variants sig- nificantly associated with each of the three subtypes. Observed genetic associations and gene-level [that is, single-nucleotide poly- morphisms (SNPs) mapped to gene-level annotations] enrichments by hypergeometric analysis are considered independent of the Fig. 1. Patient and genotype networks. (A) Patient-patient network for topology patterns on 11,210 Biobank patients. Each node repre- sents a single or a group of patients with the significant similarity based on their clinical features. Edge connected with nodes indicates the nodes have shared patients. Red color rep- resents the enrichment for patients with T2D diagnosis, and blue color represents the non- enrichment for patients with T2D diagnosis. (B) Patient-patient network for topology pat- terns on 2551 T2D patients. Each node repre- sents a single or a group of patients with the significant similarity based on their clinical features. Edge connected with nodes indicates the nodes have shared patients. Red color rep- resents the enrichment for patients with females, and blue color represents the enrichment for males. R E S E A R C H A R T I C L E www.ScienceTranslationalMedicine.org 28 October 2015 Vol 7 Issue 311 311ra174 3 onOctober28,2015http://stm.sciencemag.org/Downloadedfrom 제2형 당뇨의 3가지 subgroup을 발견 (topological analysis 를 활용. 어쨌든 data-driven analysis.)
  • 154.
    Machine Learning forHealthcare (MLFHC) 2018 at Stanford
  • 155.
    Machine Learning forHealthcare (MLFHC) 2018 at Stanford
  • 156.
    Machine Learning forHealthcare (MLFHC) 2018 at Stanford Asthma를 5가지 subgroup으로 분류
  • 157.
    Machine Learning forHealthcare (MLFHC) 2018 at Stanford Subgroup에 따른 치료 효과가 다름
  • 158.
  • 159.
    자율주행차의 구현단계 크루즈 컨트롤 차선이탈 경보 막히는 길 뚫고 달림 (운전자 딴짓 가능) 목적지까지 알아서 (운전자 자도 됨) 100% 수동 완전한 자동화 (운전과 핸들X) 운전대 스스로 조작 (운전자의 감시)
  • 160.
    자율주행차의 구현단계 크루즈 컨트롤 차선이탈 경보 막히는 길 뚫고 달림 (운전자 딴짓 가능) 목적지까지 알아서 (운전자 자도 됨) 100% 수동 완전한 자동화 (운전과 핸들X) 운전대 스스로 조작 (운전자의 감시)
  • 161.
    자율주행차의 구현단계 크루즈 컨트롤 차선이탈 경보 막히는 길 뚫고 달림 (운전자 딴짓 가능) 목적지까지 알아서 (운전자 자도 됨) 100% 수동 완전한 자동화 (운전과 핸들X) 운전대 스스로 조작 (운전자의 감시)
  • 162.
    자율주행차의 구현단계 크루즈 컨트롤 차선이탈 경보 막히는 길 뚫고 달림 (운전자 딴짓 가능) 목적지까지 알아서 (운전자 자도 됨) 100% 수동 완전한 자동화 (운전과 핸들X) 운전대 스스로 조작 (운전자의 감시)
  • 163.
    자율주행차의 구현단계 크루즈 컨트롤 차선이탈 경보 막히는 길 뚫고 달림 (운전자 딴짓 가능) 목적지까지 알아서 (운전자 자도 됨) 100% 수동 완전한 자동화 (운전과 핸들X) 운전대 스스로 조작 (운전자의 감시)
  • 164.
    자율주행차의 구현단계 크루즈 컨트롤 차선이탈 경보 막히는 길 뚫고 달림 (운전자 딴짓 가능) 목적지까지 알아서 (운전자 자도 됨) 100% 수동 완전한 자동화 (운전과 핸들X) 운전대 스스로 조작 (운전자의 감시)
  • 165.
    자율주행차의 구현단계 패러다임 전환 (사람이운전하면 불법) 크루즈 컨트롤 차선 이탈 경보 막히는 길 뚫고 달림 (운전자 딴짓 가능) 목적지까지 알아서 (운전자 자도 됨) 100% 수동 완전한 자동화 (운전과 핸들X) 운전대 스스로 조작 (운전자의 감시)
  • 166.
    크루즈 컨트롤 차선 이탈경보 막히는 길 뚫고 달림 (운전자 딴짓 가능) 목적지까지 알아서 (운전자 자도 됨) 100% 수동 완전한 자동화 (운전과 핸들X) 운전대 스스로 조작 (운전자의 감시) 자율주행차의 구현단계 패러다임 전환 (사람이 운전하면 불법) 현재의 자율주행차 현재의 의료 인공지능
  • 167.
    사람이 운전하는 것이불법 인공지능이 운전하는 것이 합법
  • 168.
    의사의 개입 없이안저 사진을 판독하여 DR 여부를 진단
  • 169.
    Deep Learning AutomaticDetection Algorithm for Malignant Pulmonary Nodules Table 3: Patient Classification and Nodule Detection at the Observer Performance Test Observer Test 1 DLAD versus Test 1 (P Value) Test 2 Test 1 versus Test 2 (P Value) Radiograph Classification (AUROC) Nodule Detection (JAFROC FOM) Radiograph Classification Nodule Detection Radiograph Classification (AUROC) Nodule Detection (JAFROC FOM) Radiograph Classification Nodule Detection Nonradiology physicians Observer 1 0.77 0.716 ,.001 ,.001 0.91 0.853 ,.001 ,.001 Observer 2 0.78 0.657 ,.001 ,.001 0.90 0.846 ,.001 ,.001 Observer 3 0.80 0.700 ,.001 ,.001 0.88 0.783 ,.001 ,.001 Group 0.691 ,.001* 0.828 ,.001* Radiology residents Observer 4 0.78 0.767 ,.001 ,.001 0.80 0.785 .02 .03 Observer 5 0.86 0.772 .001 ,.001 0.91 0.837 .02 ,.001 Observer 6 0.86 0.789 .05 .002 0.86 0.799 .08 .54 Observer 7 0.84 0.807 .01 .003 0.91 0.843 .003 .02 Observer 8 0.87 0.797 .10 .003 0.90 0.845 .03 .001 Observer 9 0.90 0.847 .52 .12 0.92 0.867 .04 .03 Group 0.790 ,.001* 0.867 ,.001* Board-certified radiologists Observer 10 0.87 0.836 .05 .01 0.90 0.865 .004 .002 Observer 11 0.83 0.804 ,.001 ,.001 0.84 0.817 .03 .04 Observer 12 0.88 0.817 .18 .005 0.91 0.841 .01 .01 Observer 13 0.91 0.824 ..99 .02 0.92 0.836 .51 .24 Observer 14 0.88 0.834 .14 .03 0.88 0.840 .87 .23 Group 0.821 .02* 0.840 .01* Thoracic radiologists Observer 15 0.94 0.856 .15 .21 0.96 0.878 .08 .03 Observer 16 0.92 0.854 .60 .17 0.93 0.872 .34 .02 Observer 17 0.86 0.820 .02 .01 0.88 0.838 .14 .12 Observer 18 0.84 0.800 ,.001 ,.001 0.87 0.827 .02 .02 Group 0.833 .08* 0.854 ,.001* Note.—Observer 4 had 1 year of experience; observers 5 and 6 had 2 years of experience; observers 7–9 had 3 years of experience; observers 10–12 had 7 years of experience; observers 13 and 14 had 8 years of experience; observer 15 had 26 years of experience; observer 16 had 13 years of experience; and observers 17 and 18 had 9 years of experience. Observers 1–3 were 4th-year residents from obstetrics and gynecolo- 의사 인공지능 vs. 의사만 (p value) 의사+인공지능 의사 vs. 의사+인공지능 (p value) 영상의학과 1년차 전공의 영상의학과 2년차 전공의 영상의학과 3년차 전공의 산부인과 4년차 전공의 정형외과 4년차 전공의 내과 4년차 전공의 영상의학과 전문의 7년 경력 8년 경력 영상의학과 전문의 (흉부) 26년 경력 13년 경력 9년 경력 영상의학과 전공의 비영상의학과 의사 인공지능 0.91 0.885 •“인공지능 혼자” 한 것이 “영상의학과 전문의+인공지능”보다 대부분 더 정확 •classification: 9명 중 6명보다 나음 •nodule detection: 9명 전원보다 나음