의료의 미래, 디지털 헬스케어: 정신의학을 중심으로

Professor, SAHIST, Sungkyunkwan University
Director, Digital Healthcare Institute
Yoon Sup Choi, Ph.D.
의료의 미래, 디지털 헬스케어

: 정신의학을 중심으로

“It's in Apple's DNA that technology alone is not enough.  
It's technology married with liberal arts.”

The Convergence of IT, BT and Medicine

대한영상의학회 춘계학술대회 2017.6

Vinod Khosla
Founder, 1st CEO of Sun Microsystems
Partner of KPCB, CEO of KhoslaVentures
LegendaryVenture Capitalist in SiliconValley

“Technology will replace 80% of doctors”

https://www.youtube.com/watch?time_continue=70&v=2HMPRXstSvQ
“영상의학과 전문의를 양성하는 것을 당장 그만둬야 한다.
5년 안에 딥러닝이 영상의학과 전문의를 능가할 것은 자명하다.”
Hinton on Radiology

헬스케어넓은 의미의 건강 관리에는 해당되지만,
디지털 기술이 적용되지 않고, 전문 의료 영역도 아닌 것
예) 운동, 영양, 수면
디지털 헬스케어
건강 관리 중에 디지털 기술이 사용되는 것
예) 사물인터넷, 인공지능, 3D 프린터
모바일 헬스케어
디지털 헬스케어 중
모바일 기술이 사용되는 것
예) 스마트폰, 사물인터넷, SNS
개인 유전정보분석
예) 암유전체, 질병위험도,
보인자, 약물 민감도
예) 웰니스, 조상 분석
헬스케어 관련 분야 구성도 (ver 0.3)
의료
질병 예방, 치료, 처방, 관리
등 전문 의료 영역
원격의료
원격진료

What is most important factor in digital medicine?

“Data! Data! Data!” he cried.“I can’t
make bricks without clay!”
- Sherlock Holmes,“The Adventure of the Copper Beeches”

새로운 데이터가

새로운 방식으로

새로운 주체에 의해

측정, 저장, 통합, 분석된다.
데이터의 종류

데이터의 질적/양적 측면
웨어러블 기기

스마트폰

유전 정보 분석

인공지능

SNS
사용자/환자

대중

Digital Healthcare Industry Landscape
Data Measurement Data Integration Data Interpretation Treatment
Smartphone Gadget/Apps
DNA
Artiﬁcial Intelligence
2nd Opinion
Wearables / IoT
(ver. 3)
EMR/EHR 3D Printer
Counseling
Data Platform
Accelerator/early-VC
Telemedicine
Device
On Demand (O2O)
VR
Digital Healthcare Institute
Diretor, Yoon Sup Choi, Ph.D.
yoonsup.choi@gmail.com

Data Measurement Data Integration Data Interpretation Treatment
Smartphone Gadget/Apps
DNA
Artiﬁcial Intelligence
2nd Opinion
Device
On Demand (O2O)
Wearables / IoT
Diretor, Yoon Sup Choi, Ph.D.
EMR/EHR 3D Printer
Counseling
Data Platform
Accelerator/early-VC
VR
Telemedicine
Digital Healthcare Industry Landscape (ver. 3)

Digital Phenotype
당신의 스마트폰은 당신이 우울한지 알고 있다

Smartphone: the origin of healthcare innovation

2013?
The election of Pope Benedict
The Election of Pope Francis

The Election of Pope Francis
The Election of Pope Benedict

• 아이폰의 센서로 측정한 자신의 의료/건강 데이터를 플랫폼에 공유 가능

• 가속도계, 마이크, 자이로스코프, GPS 센서 등을 이용

• 걸음, 운동량, 기억력, 목소리 떨림 등등

• 기존의 의학연구의 문제를 해결: 충분한 의료 데이터의 확보

• 연구 참여자 등록에 물리적, 시간적 장벽을 제거 (1번/3개월 ➞ 1번/1초)

• 대중의 의료 연구 참여 장려: 연구 참여자의 수 증가

• 발표 후 24시간 내에 수만명의 연구 참여자들이 지원

• 사용자 본인의 동의 하에 진행
Research Kit

•초기 버전으로, 5가지 질환에 대한 앱 5개를 소개
ResearchKit

http://www.roche.com/media/store/roche_stories/roche-stories-2015-08-10.htm

http://www.roche.com/media/store/roche_stories/roche-stories-2015-08-10.htm
pRED app to track Parkinson’s symptoms in drug trial

the manifestations of disease by providing a
more comprehensive and nuanced view of the
experience of illness. Through the lens of the
digital phenotype, an individual’s interaction
The digital phenotype
Sachin H Jain, Brian W Powers, Jared B Hawkins & John S Brownstein
In the coming years, patient phenotypes captured to enhance health and wellness will extend to human interactions with
digital technology.
In 1982, the evolutionary biologist Richard
Dawkins introduced the concept of the
“extended phenotype”1, the idea that pheno-
types should not be limited just to biological
processes, such as protein biosynthesis or tissue
growth, but extended to include all effects that
a gene has on its environment inside or outside
ofthebodyoftheindividualorganism.Dawkins
stressed that many delineations of phenotypes
are arbitrary. Animals and humans can modify
their environments, and these modifications
andassociatedbehaviorsareexpressionsofone’s
genome and, thus, part of their extended phe-
notype. In the animal kingdom, he cites damn
buildingbybeaversasanexampleofthebeaver’s
extended phenotype1.
Aspersonaltechnologybecomesincreasingly
embedded in human lives, we think there is an
important extension of Dawkins’s theory—the
notion of a ‘digital phenotype’. Can aspects of
ourinterfacewithtechnologybesomehowdiag-
nosticand/orprognosticforcertainconditions?
Can one’s clinical data be linked and analyzed
together with online activity and behavior data
to create a unified, nuanced view of human dis-
ease?Here,wedescribetheconceptofthedigital
phenotype. Although several disparate studies
have touched on this notion, the framework for
medicine has yet to be described. We attempt to
define digital phenotype and further describe
the opportunities and challenges in incorporat-
ing these data into healthcare.
Jan. 2013
0.000
0.002
0.004
Density
0.006
July 2013 Jan. 2014 July 2014
User 1
User 2
User 3
User 4
User 5
User 6
User 7
Date
Figure 1 Timeline of insomnia-related tweets from representative individuals. Density distributions
(probability density functions) are shown for seven individual users over a two-year period. Density on
the y axis highlights periods of relative activity for each user. A representative tweet from each user is
shown as an example.
npg©2015NatureAmerica,Inc.Allrightsreserved.
http://www.nature.com/nbt/journal/v33/n5/full/nbt.3223.html

genotype vs phenotype
(유전형) (표현형)

“Extended Phenotype”(확장된 표현형)

Digital Phenotype:
Your smartphone knows if you are depressed
Ginger.io

Ginger.io
•문자를 얼마나 자주 하는지

•통화를 얼마나 오래하는지

•누구와 통화를 하는지

•얼마나 거리를 많이 이동했는지

•얼마나 많이 움직였는지
• UCSF, McLean Hospital: 정신질환 연구

• Novant Health: 당뇨병, 산후 우울증 연구

• UCSF, Duke: 수술 후 회복 모니터링

Digital Phenotype:
J Med Internet Res. 2015 Jul 15;17(7):e175.
The correlation analysis between the features and the PHQ-9 scores revealed that 6 of the 10
features were signiﬁcantly correlated to the scores:
• strong correlation: circadian movement, normalized entropy, location variance
• correlation: phone usage features, usage duration and usage frequency

Digital Phenotype:
J Med Internet Res. 2015 Jul 15;17(7):e175.
Comparison of location and usage feature statistics between participants with no symptoms of depression (blue) and the
ones with (red). (ENT, entropy; ENTN, normalized entropy; LV, location variance; HS, home stay;TT, transition time;TD,
total distance; CM, circadian movement; NC, number of clusters; UF, usage frequency; UD, usage duration).
Figure 4. Comparison of location and usage feature statistics between participants with no symptoms of depression (blue) and the ones with (red).
Feature values are scaled between 0 and 1 for easier comparison. Boxes extend between 25th and 75th percentiles, and whiskers show the range.
Horizontal solid lines inside the boxes are medians. One, two, and three asterisks show significant differences at P<.05, P<.01, and P<.001 levels,
respectively (ENT, entropy; ENTN, normalized entropy; LV, location variance; HS, home stay; TT, transition time; TD, total distance; CM, circadian
movement; NC, number of clusters; UF, usage frequency; UD, usage duration).
Figure 5. Coefficients of correlation between location features. One, two, and three asterisks indicate significant correlation levels at P<.05, P<.01,
and P<.001, respectively (ENT, entropy; ENTN, normalized entropy; LV, location variance; HS, home stay; TT, transition time; TD, total distance;
CM, circadian movement; NC, number of clusters).
Saeb et alJOURNAL OF MEDICAL INTERNET RESEARCH
the variability of the time
the participant spent at
the location clusters
what extent the participants’
sequence of locations followed a
circadian rhythm.
home stay

Submitted 23 June 2016
Accepted 7 September 2016
Published 29 September 2016
Corresponding author
David C. Mohr,
d-mohr@northwestern.edu
Academic editor
Anthony Jorm
Additional Information and
Declarations can be found on
page 12
DOI 10.7717/peerj.2537
Copyright
2016 Saeb et al.
Distributed under
Creative Commons CC-BY 4.0
OPEN ACCESS
The relationship between mobile phone
location sensor data and depressive
symptom severity
Sohrab Saeb1,2
, Emily G. Lattie1
, Stephen M. Schueller1
,
Konrad P. Kording2
and David C. Mohr1
1
Department of Preventive Medicine, Northwestern University, Chicago, IL, United States
2
Rehabilitation Institute of Chicago, Department of Physical Medicine and Rehabilitation,
Northwestern University, Chicago, IL, United States
ABSTRACT
Background. Smartphones offer the hope that depression can be detected using
passively collected data from the phone sensors. The aim of this study was to replicate
andextendpreviousworkusinggeographiclocation(GPS)sensorstoidentifydepressive
symptom severity.
Methods. We used a dataset collected from 48 college students over a 10-week period,
which included GPS phone sensor data and the Patient Health Questionnaire 9-item
(PHQ-9) to evaluate depressive symptom severity at baseline and end-of-study. GPS
featureswerecalculatedovertheentirestudy,forweekdaysandweekends,andin2-week
blocks.
Results. The results of this study replicated our previous findings that a number of
GPS features, including location variance, entropy, and circadian movement, were
significantly correlated with PHQ-9 scores (r’s ranging from 0.43 to 0.46, p-values
< .05). We also found that these relationships were stronger when GPS features were
calculatedfromweekend,comparedtoweekday,data.Althoughthecorrelationbetween
baseline PHQ-9 scores with 2-week GPS features diminished as we moved further from
baseline, correlations with the end-of-study scores remained significant regardless of the
time point used to calculate the features.
Discussion. Our findings were consistent with past research demonstrating that GPS
features may be an important and reliable predictor of depressive symptom severity.
The varying strength of these relationships on weekends and weekdays suggests the role
of weekend/weekday as a moderating variable. The finding that GPS features predict
depressive symptom severity up to 10 weeks prior to assessment suggests that GPS
features may have the potential as early warning signals of depression.
Subjects Bioinformatics, Psychiatry and Psychology, Public Health, Computational Science
Keywords Mobile phone, Depression, Depressive symptoms, Geographic locations, Students
INTRODUCTION
Depression is common and debilitating, taking an enormous toll in terms of cost,
morbidity, and mortality (Ferrari et al., 2013; Greenberg et al., 2015). The 12-month
prevalence of major depressive disorder among adults in the US is 6.9% (Kessler et al.,
2005), and an additional 2–5% have subsyndromal symptoms that warrant treatment
Saeb et al. (2016), PeerJ, DOI 10.7717/peerj.2537

The relationship between mobile phone location sensor
data and depressive symptom severity
Linear correlation coefficients (r) between individual 10-week features and PHQ-9 scores, and their 95% confidence
intervals. Features indicated with stars (∗) are replicated from our previous study (Saeb et al., 2015a.). Bold values indicate
significant correlations.
Table 2 Linear correlation coefficients (r) between individual 10-week features and PHQ-9 scores, and
their 95% confidence intervals. Features indicated with stars (⇤) are replicated from our previous study
(Saeb et al., 2015a.). Bold values indicate significant correlations.
Feature Baseline (n = 46) Follow-up (n = 38) Change (n = 38)
Location variance⇤
0.29 ± 0.008 0.43 ± 0.007 0.34 ± 0.008
Circadian movement⇤
0.34 ± 0.006 0.48 ± 0.006 0.33 ± 0.009
Speed mean 0.03 ± 0.007 0.06 ± 0.005 0.04 ± 0.008
Speed variance 0.07 ± 0.007 0.06 ± 0.005 0.06 ±0.007
Total distance⇤
0.23 ± 0.004 0.18 ± 0.006 0.03 ± 0.006
Number of clusters⇤
0.38 ± 0.005 0.44 ± 0.004 0.24 ± 0.007
Entropy⇤
0.31 ± 0.007 0.46 ± 0.005 0.28 ± 0.008
Normalized entropy⇤
0.26 ± 0.007 0.44 ± 0.005 0.30 ± 0.009
Raw entropy 0.17 ± 0.009 0.22 ± 0.008 0.15 ± 0.010
Home stay⇤
0.22 ± 0.008 0.43 ± 0.005 0.30 ± 0.009
Transition time⇤
0.30 ± 0.006 0.32 ± 0.005 0.12 ± 0.009
Data analysis
We evaluated the relationship between each set of features (10-week and 2-week, each for all
days, weekends, or weekdays) and depressive symptoms severity as measured by the PHQ-9.
We used linear correlation coefficient (r) and considered p < 0.05 as the significance level.
In order to reduce the possibility that results were generated by chance, we created 1,000
bootstrap subsamples (Efron & Tibshirani, 1993) to estimate these correlation coefficientsSaeb et al. (2016), PeerJ, DOI 10.7717/peerj.2537

Table 3 Linear correlation coefficients (r) between individual weekend and weekday features and PHQ-9 scores, and their 95% confidence in-
tervals. Bold values indicate significant correlations (see ‘Data Analysis’).
Feature Weekday Weekend
Baseline (n = 46) Follow-up (n = 38) Change (n = 38) Baseline (n = 46) Follow-up (n = 38) Change (n = 38)
Location variance 0.15 ± 0.008 0.20 ± 0.008 0.22 ± 0.009 0.31 ± 0.008 0.47 ±0.007 0.39 ± 0.008
Circadian movement 0.22 ± 0.007 0.28 ± 0.008 0.25 ± 0.009 0.35 ± 0.007 0.51 ±0.006 0.36 ± 0.008
Speed mean 0.00 ± 0.008 0.06 ± 0.005 0.03 ± 0.008 0.13 ± 0.005 0.06 ± 0.006 0.05 ± 0.009
Speed variance 0.05 ± 0.008 0.07 ± 0.005 0.02 ± 0.007 0.13 ± 0.004 0.05 ± 0.006 0.10 ± 0.008
Total distance 0.20 ± 0.004 0.15 ± 0.005 0.01 ± 0.006 0.25 ± 0.004 0.20 ± 0.005 0.03 ± 0.006
Number of clusters 0.19 ± 0.006 0.25 ± 0.005 0.14 ± 0.008 0.34 ± 0.006 0.46 ±0.004 0.32 ± 0.007
Entropy 0.21 ± 0.007 0.34 ± 0.006 0.20 ± 0.009 0.30 ± 0.008 0.55 ±0.004 0.38 ± 0.008
Normalized entropy 0.21 ± 0.008 0.39 ± 0.006 0.24 ± 0.009 0.28 ± 0.008 0.54 ± 0.004 0.41 ± 0.009
Raw entropy 0.05 ± 0.008 0.04 ± 0.008 0.01 ± 0.010 0.04 ± 0.008 0.01 ± 0.008 0.03 ± 0.009
Home stay 0.19 ± 0.008 0.37 ± 0.006 0.23 ± 0.009 0.23 ± 0.007 0.50 ± 0.004 0.35 ± 0.008
Transition time 0.27 ± 0.006 0.29 ± 0.006 0.14 ± 0.010 0.36 ± 0.006 0.32 ± 0.008 0.06 ± 0.009
only normalized entropy was significantly related to the scores as a weekday feature. The
magnitude of the relationship between weekend features and PHQ-9 scores was larger than
the magnitude of the relationship between 10-week features and PHQ-9 scores. However,
given the small sample size, we were not adequately powered to test if these differences were
significant.
2-week features
Finally, we examined how 2-week GPS features obtained at different times during the study
Linear correlation coefficients (r) between individual weekend and weekday features and PHQ-9 scores, and their 95%
confidence intervals. Bold values indicate significant correlations.All of those 10-week features that were significantly
related to PHQ-9 scores (seeTable 2) were also significant when calculated from weekends, whereas only normalized
entropy was significantly related to the scores as a weekday feature

Mean temporal correlations between 2-week location features, calculated at different time points during the study, and
baseline and follow-up PHQ-9 scores.

Reece & Danforth, “Instagram photos reveal predictive markers of depression” (2016)
higher Hue (bluer)
lower Saturation (grayer)
lower Brightness (darker)

Digital Phenotype:
Your Instagram knows if you are depressed
Rao (MVR) (24) .

Results
Both Alldata and Prediagnosis models were decisively superior to a null model
. Alldata predictors were significant with 99% probability.57.5;(KAll = 1 K 49.8) Pre = 1 7
Prediagnosis and Alldata confidence levels were largely identical, with two exceptions:
Prediagnosis Brightness decreased to 90% confidence, and Prediagnosis posting frequency
dropped to 30% confidence, suggesting a null predictive value in the latter case.
Increased hue, along with decreased brightness and saturation, predicted depression. This
means that photos posted by depressed individuals tended to be bluer, darker, and grayer (see
Fig. 2). The more comments Instagram posts received, the more likely they were posted by
depressed participants, but the opposite was true for likes received. In the Alldata model, higher
posting frequency was also associated with depression. Depressed participants were more likely
to post photos with faces, but had a lower average face count per photograph than healthy
participants. Finally, depressed participants were less likely to apply Instagram filters to their
posted photos.

Fig. 2. Magnitude and direction of regression coefficients in Alldata (N=24,713) and Prediagnosis (N=18,513)
models. Xaxis values represent the adjustment in odds of an observation belonging to depressed individuals, per

Fig. 1. Comparison of HSV values. Right photograph has higher Hue (bluer), lower Saturation (grayer), and lower
Brightness (darker) than left photograph. Instagram photos posted by depressed individuals had HSV values
shifted towards those in the right photograph, compared with photos posted by healthy individuals.

Units of observation
In determining the best time span for this analysis, we encountered a difficult question:
When and for how long does depression occur? A diagnosis of depression does not indicate the
persistence of a depressive state for every moment of every day, and to conduct analysis using an
individual’s entire posting history as a single unit of observation is therefore rather specious. At
the other extreme, to take each individual photograph as units of observation runs the risk of
being too granular. DeChoudhury et al. (5) looked at all of a given user’s posts in a single day,
and aggregated those data into perperson, perday units of observation. We adopted this
precedent of “userdays” as a unit of analysis .  5

Statistical framework
We used Bayesian logistic regression with uninformative priors to determine the strength
of individual predictors. Two separate models were trained. The Alldata model used all
collected data to address Hypothesis 1. The Prediagnosis model used all data collected from
higher Hue (bluer)
lower Saturation (grayer)
lower Brightness (darker)

Digital Phenotype:
. In particular, depressedχ2 07.84, p .17e 64;( All = 9 = 9 − 1 13.80, p .87e 44)χ2Pre = 8 = 2 − 1
participants were less likely than healthy participants to use any filters at all. When depressed
participants did employ filters, they most disproportionately favored the “Inkwell” filter, which
converts color photographs to blackandwhite images. Conversely, healthy participants most
disproportionately favored the Valencia filter, which lightens the tint of photos. Examples of
filtered photographs are provided in SI Appendix VIII.

Fig. 3. Instagram filter usage among depressed and healthy participants. Bars indicate difference between observed
and expected usage frequencies, based on a Chisquared analysis of independence. Blue bars indicate
disproportionate use of a filter by depressed compared to healthy participants, orange bars indicate the reverse.

Digital Phenotype:

VIII. Instagram filter examples

Fig. S8. Examples of Inkwell and Valencia Instagram filters. Inkwell converts
color photos to blackandwhite, Valencia lightens tint. Depressed participants
most favored Inkwell compared to healthy participants, Healthy participants

ers, Jared B Hawkins & John S Brownstein
phenotypes captured to enhance health and wellness will extend to human interactions with
st Richard
pt of the
hat pheno-
biological
sis or tissue
effects that
or outside
m.Dawkins
phenotypes
can modify
difications
onsofone’s
ended phe-
cites damn
thebeaver’s
ncreasingly
there is an
heory—the
aspects of
ehowdiag-
Jan. 2013
0.000
0.002
0.004
Density
0.006
July 2013 Jan. 2014 July 2014
User 1
User 2
User 3
User 4
User 5
User 6
User 7
Date
Figure 1 Timeline of insomnia-related tweets from representative individuals. Density distributions
(probability density functions) are shown for seven individual users over a two-year period. Density on
the y axis highlights periods of relative activity for each user. A representative tweet from each user is
Your twitter knows if you cannot sleep
Timeline of insomnia-related tweets from representative individuals.
Nat. Biotech. 2015

•트위터 내용과 패턴을 바탕으로 양극성 장애 환자와 정상인 구분

•포스팅 패턴, 빈도, 단어의 분석 통한 감정 파악

•음운론 기반의 phonological feature

•high energy word: 얼마나 강한 발음/억양의 단어를 사용하는가 
•양극성 장애를 진단 받은 406명의 환자 트윗

•진단 받기 1년 전부터의 트윗을 대조군과 비교 
•90% 이상의 정확도(precision)으로 구분 가능

Detection of the Prodromal Phase of Bipolar Disorder from
Psychological and Phonological Aspects in Social Media
Yen-Hao Huang
National Tsing Hua University
Hsinchu, Taiwan
yenhao0218@gmail.com
Lin-Hung Wei
Hsinchu, Taiwan
adeline80916@gmail.com
Yi-Shin Chen
Hsinchu, Taiwan
yishin@gmail.com
ABSTRACT
Seven out of ten people with bipolar disorder are initially
misdiagnosed and thirty percent of individuals with bipolar
disorder will commit suicide. Identifying the early phases of
the disorder is one of the key components for reducing the
full development of the disorder. In this study, we aim at
leveraging the data from social media to design predictive
models, which utilize the psychological and phonological fea-
tures, to determine the onset period of bipolar disorder and
provide insights on its prodrome. This study makes these dis-
coveries possible by employing a novel data collection process,
coined as Time-specific Subconscious Crowdsourcing, which
helps collect a reliable dataset that supplements diagnosis
information from people suffering from bipolar disorder. Our
experimental results demonstrate that the proposed models
could greatly contribute to the regular assessments of people
with bipolar disorder, which is important in the primary care
setting.
KEYWORDS
Bipolar Disorder Detection, Mental Disorder, Prodromal
Phrase, Emotion Analysis, Sentiment Analysis, Phonology,
Social Media
1 INTRODUCTION
Bipolar disorder (BD) is a common mental illness charac-
terized by recurrent episodes of mania/hypomania and de-
pression, which is found among all ages, races, ethnic groups
and social classes. The regular assessment of people with
BD is an important part of its treatment, though it may be
very time-consuming [21]. There are many beneficial treat-
ments for the patients, particularly for delaying relapses. The
identification of early symptoms is significant for allowing
early intervention and reducing the multiple adverse conse-
quences of a full-blown episode. Despite the importance of
the detection of prodromal symptoms, there are very few
studies that have actually examined the ability of relatives to
detect these symptoms in BD patients. [20] For the purpose
of early treatment, the challenge leads to: how to identify
the prodrome period of BD. Current studies are thus
aimed at detecting prodromes and analyzing the prodromal
symptoms of manic recurrence in clinics.
With regards to the symptom of social isolation, people
are increasingly turning to popular social media, such as
Facebook and Twitter, to share their illness experiences or
seek advice from others with similar mental health conditions.
As the information is being shared in public, people are
subconsciously providing rich contents about their states
of mind. In this paper, we refer to this sharing and data
collection as time-specific subconscious crowdsourcing.
In this study, we carefully look at patients who have been
diagnosed with BD and who explicitly indicate the diagnosis
and time of diagnosis on Twitter. Our goal is to both predict
whether BD rises on a given period of time, and to discover
the prodromal period for BD. It’s important to clarify that
our goal doesn’t seek to offer a diagnosis but rather to make
a prediction of which users are likely to be suffering from the
BD. The main contributions of our work are:
• Introducing the concept of time-specific subconscious
crowdsourcing, which can aid in locating the social
network behavior data of BD patients with the corre-
sponding time of diagnosis.
• A BD assessment mechanism that differentiates be-
tween prodromal symptoms and acute symptoms.
• Introducing the phonological features into the assess-
ment mechanism, which allows for the possibility to
assess patients through text only.
• An automatic recognition approach that detects the
possible prodromal period for BD.
2 RELATED WORK
Social media resources have been widely utilized by researchers
to study mental health issues. The following literature em-
phasizes on data collection and feature engineering, including
subject recruitment, manual data collection, data collection
applications, keyword matching, and combined approaches.
The clinical approach for mental disorders and prodrome
studies are also discussed in this section.
Subject recruitment: Based on customized question-
naires and contact with subjects, Park et al. [15] recruited
participants for the Center for Epidemiologic Studies Depres-
sion scale(CES-D) [17] and provided their Twitter data. By
analyzing the information contained in tweets, participants
were divided into normal and depressive groups based on
their scores on CES-D. An approach like this one requires ex-
pensive costs to acquire data and conduct the questionnaire.
Manual and automatic data collecting: Moreno et
al. [14] collected data via the Facebook profiles of college stu-
dents reviewed by two investigators. They aimed at revealing
the relationship between demographic factors and depression.
Similarly, in our work, we invest on manual efforts to collect
and properly annotate our dataset. In addition, there are
many applications built on top of social networks that provide
free services where users may need to input their credentials
arXiv:1712.09183v1[cs.IR]26Dec2017

Detection of the Prodromal Phase of Bipolar Disorder from
Psychological and Phonological Aspects in Social Media
Yen-Hao Huang
Hsinchu, Taiwan
yenhao0218@gmail.com
Lin-Hung Wei
Hsinchu, Taiwan
adeline80916@gmail.com
Yi-Shin Chen
Hsinchu, Taiwan
yishin@gmail.com
ABSTRACT
Seven out of ten people with bipolar disorder are initially
misdiagnosed and thirty percent of individuals with bipolar
disorder will commit suicide. Identifying the early phases of
the disorder is one of the key components for reducing the
full development of the disorder. In this study, we aim at
leveraging the data from social media to design predictive
models, which utilize the psychological and phonological fea-
tures, to determine the onset period of bipolar disorder and
provide insights on its prodrome. This study makes these dis-
coveries possible by employing a novel data collection process,
coined as Time-specific Subconscious Crowdsourcing, which
helps collect a reliable dataset that supplements diagnosis
information from people suffering from bipolar disorder. Our
experimental results demonstrate that the proposed models
could greatly contribute to the regular assessments of people
with bipolar disorder, which is important in the primary care
setting.
KEYWORDS
Bipolar Disorder Detection, Mental Disorder, Prodromal
Phrase, Emotion Analysis, Sentiment Analysis, Phonology,
Social Media
1 INTRODUCTION
Bipolar disorder (BD) is a common mental illness charac-
terized by recurrent episodes of mania/hypomania and de-
pression, which is found among all ages, races, ethnic groups
and social classes. The regular assessment of people with
BD is an important part of its treatment, though it may be
very time-consuming [21]. There are many beneficial treat-
ments for the patients, particularly for delaying relapses. The
identification of early symptoms is significant for allowing
early intervention and reducing the multiple adverse conse-
quences of a full-blown episode. Despite the importance of
the detection of prodromal symptoms, there are very few
studies that have actually examined the ability of relatives to
detect these symptoms in BD patients. [20] For the purpose
of early treatment, the challenge leads to: how to identify
the prodrome period of BD. Current studies are thus
aimed at detecting prodromes and analyzing the prodromal
symptoms of manic recurrence in clinics.
With regards to the symptom of social isolation, people
are increasingly turning to popular social media, such as
Facebook and Twitter, to share their illness experiences or
seek advice from others with similar mental health conditions.
As the information is being shared in public, people are
subconsciously providing rich contents about their states
of mind. In this paper, we refer to this sharing and data
collection as time-specific subconscious crowdsourcing.
In this study, we carefully look at patients who have been
diagnosed with BD and who explicitly indicate the diagnosis
and time of diagnosis on Twitter. Our goal is to both predict
whether BD rises on a given period of time, and to discover
the prodromal period for BD. It’s important to clarify that
our goal doesn’t seek to offer a diagnosis but rather to make
a prediction of which users are likely to be suffering from the
BD. The main contributions of our work are:
• Introducing the concept of time-specific subconscious
crowdsourcing, which can aid in locating the social
network behavior data of BD patients with the corre-
sponding time of diagnosis.
• A BD assessment mechanism that differentiates be-
tween prodromal symptoms and acute symptoms.
• Introducing the phonological features into the assess-
ment mechanism, which allows for the possibility to
assess patients through text only.
• An automatic recognition approach that detects the
possible prodromal period for BD.
2 RELATED WORK
Social media resources have been widely utilized by researchers
to study mental health issues. The following literature em-
phasizes on data collection and feature engineering, including
subject recruitment, manual data collection, data collection
applications, keyword matching, and combined approaches.
The clinical approach for mental disorders and prodrome
studies are also discussed in this section.
Subject recruitment: Based on customized question-
naires and contact with subjects, Park et al. [15] recruited
participants for the Center for Epidemiologic Studies Depres-
sion scale(CES-D) [17] and provided their Twitter data. By
analyzing the information contained in tweets, participants
were divided into normal and depressive groups based on
their scores on CES-D. An approach like this one requires ex-
pensive costs to acquire data and conduct the questionnaire.
Manual and automatic data collecting: Moreno et
al. [14] collected data via the Facebook profiles of college stu-
dents reviewed by two investigators. They aimed at revealing
the relationship between demographic factors and depression.
Similarly, in our work, we invest on manual efforts to collect
and properly annotate our dataset. In addition, there are
many applications built on top of social networks that provide
free services where users may need to input their credentials
arXiv:1712.09183v1[cs.IR]26Dec2017
Wordcloud
Features(#DIM) 2 mths 3 mths 6 mths 9 mths 12 mths
AG(2)
0.475 0.503 0.445 0.434 0.383
Pol(5)
0.911 0.893 0.843 0.836 0.803
Emot(8)
0.893 0.895 0.908 0.917 0.896
Soc(4)
0.941 0.913 0.845 0.834 0.786
LT(1)
0.645 0.589 0.554 0.504 0.513
TRD(1)
0.570 0.638 0.626 0.615 0.654
Phon(8)
0.889 0.880 0.802 0.838 0.821
Table 2: Average Precision of Single Feature Perfor-
mance
Age and Gender
Mood Polarity Features
Emotional Score
Social Feature
Late Tweet Frequency
Tweet Rate Difference
Phonological Feature
Diagnosed time !
" months
" = 2 months
Figure 1: Illness Period Modeling
features are introduced: (1) Word-level features and BD
Pattern of Life features.
3.4.1 Word-level Features. With respect to the linguis-
tic features for BD, the Character n-gram language fea-
tures(CLF) and LIWC metrics are designed to capture it.
The CLF utilizes n-grams to measure the comment words
or phrases used by users. The tf-idf is utilized in our score-
calculating method, the tf is the frequency of an n-gram and
the document d of df is defined as each particular twitter
user k. The formula for the tf-idf is thus given as:
tfidf
(k,⌧,↵)
vn = freq
(k,⌧,↵)
vn ⇥ log
K
1 + freq
(K,⌧,↵)
vn
(1)
The freq
(k)
vn is the frequency of n-gram vn
, which is n 2 {1, 2}
to represent psychological features, su
terns and the behavioral tendency o
polarity, emotion, and social interacti
full BDPLF, there are five categories:
• Age and Gender: Sit et al. [2
effects on BD, indicating that wom
likely to have Bipolar Disorder
than men. We make use of the ag
proposed by Sap et al. [19], whic
social media.
• Mood Polarity Features: Ow
BD patients experience rapid mo
analysis is firstly adapted to obt
larity portrayed by each user’s t
the sentiment of tweets, the onlin
used, based on Go et al.’s work [
the contents of tweets into thre
positive, negative, and neutral.
those three categories into five
positive ratio, negative ratio, po
combo, and flips ratio.
• Emotional Scores: Beyond th
tion detection tool proposed by
employed to classify the tweets in
gories: joy, surprise, anticipation,
anger, and fear. The emotion cla
further transformed into emotio
esei,
(k)
⌧,↵
=
ei,
(k)
⌧,↵
ecount
진단 받기 1년 전 부터의 트윗을 대조군과 비교 분석
개별 feature만으로도 위험군 분류에 높은 precision을 보였음

AI for mental health
정신 의학을 위한 인공지능

No choice but to bring AI into the medicine

Martin Duggan,“IBM Watson Health - Integrated Care & the Evolution to Cognitive Computing”

•약한 인공 지능 (Artificial Narrow Intelligence)

• 특정 방면에서 잘하는 인공지능

• 체스, 퀴즈, 메일 필터링, 상품 추천, 자율 운전

•강한 인공 지능 (Artificial General Intelligence)

• 모든 방면에서 인간 급의 인공 지능

• 사고, 계획, 문제해결, 추상화, 복잡한 개념 학습

•초 인공 지능 (Artificial Super Intelligence)

• 과학기술, 사회적 능력 등 모든 영역에서 인간보다 뛰어난 인공 지능

• “충분히 발달한 과학은 마법과 구분할 수 없다” - 아서 C. 클라크

Jeopardy!
2011년 인간 챔피언 두 명 과 퀴즈 대결을 벌여서 압도적인 우승을 차지

600,000 pieces of medical evidence
2 million pages of text from 42 medical journals and clinical trials
69 guidelines, 61,540 clinical trials
IBM Watson on Medicine
Watson learned...
+
1,500 lung cancer cases
physician notes, lab results and clinical research
+
14,700 hours of hands-on training

Annals of Oncology (2016) 27 (suppl_9): ix179-ix180. 10.1093/annonc/mdw601
Validation study to assess performance of IBM cognitive
computing system Watson for oncology with Manipal
multidisciplinary tumour board for 1000 consecutive cases:  
An Indian experience
• MMDT(Manipal multidisciplinary tumour board) treatment recommendation and
data of 1000 cases of 4 different cancers breast (638), colon (126), rectum (124)
and lung (112) which were treated in last 3 years was collected.
• Of the treatment recommendations given by MMDT, WFO provided  
 
50% in REC, 28% in FC, 17% in NREC
• Nearly 80% of the recommendations were in WFO REC and FC group
• 5% of the treatment provided by MMDT was not available with WFO
• The degree of concordance varied depending on the type of cancer
• WFO-REC was high in Rectum (85%) and least in Lung (17.8%)
• high with TNBC (67.9%); HER2 negative (35%) 
• WFO took a median of 40 sec to capture, analyze and give the treatment. 
 
(vs MMDT took the median time of 15 min)

WFO in ASCO 2017
• Early experience with IBM WFO cognitive computing system for lung  
 
and colorectal cancer treatment (마니팔 병원) 
• 지난 3년간: lung cancer(112), colon cancer(126), rectum cancer(124)
• lung cancer: localized 88.9%, meta 97.9%
• colon cancer: localized 85.5%, meta 76.6%
• rectum cancer: localized 96.8%, meta 80.6%
Performance of WFO in India
2017 ASCO annual Meeting, J Clin Oncol 35, 2017 (suppl; abstr 8527)

ORIGINAL ARTICLE
Watson for Oncology and breast cancer treatment
recommendations: agreement with an expert
multidisciplinary tumor board
S. P. Somashekhar1*, M.-J. Sepu´lveda2
, S. Puglielli3
, A. D. Norden3
, E. H. Shortliffe4
, C. Rohit Kumar1
,
A. Rauthan1
, N. Arun Kumar1
, P. Patil1
, K. Rhee3
& Y. Ramya1
1
Manipal Comprehensive Cancer Centre, Manipal Hospital, Bangalore, India; 2
IBM Research (Retired), Yorktown Heights; 3
Watson Health, IBM Corporation,
Cambridge; 4
Department of Surgical Oncology, College of Health Solutions, Arizona State University, Phoenix, USA
*Correspondence to: Prof. Sampige Prasannakumar Somashekhar, Manipal Comprehensive Cancer Centre, Manipal Hospital, Old Airport Road, Bangalore 560017, Karnataka,
India. Tel: þ91-9845712012; Fax: þ91-80-2502-3759; E-mail: somashekhar.sp@manipalhospitals.com
Background: Breast cancer oncologists are challenged to personalize care with rapidly changing scientific evidence, drug
approvals, and treatment guidelines. Artificial intelligence (AI) clinical decision-support systems (CDSSs) have the potential to
help address this challenge. We report here the results of examining the level of agreement (concordance) between treatment
recommendations made by the AI CDSS Watson for Oncology (WFO) and a multidisciplinary tumor board for breast cancer.
Patients and methods: Treatment recommendations were provided for 638 breast cancers between 2014 and 2016 at the
Manipal Comprehensive Cancer Center, Bengaluru, India. WFO provided treatment recommendations for the identical cases in
2016. A blinded second review was carried out by the center’s tumor board in 2016 for all cases in which there was not
agreement, to account for treatments and guidelines not available before 2016. Treatment recommendations were considered
concordant if the tumor board recommendations were designated ‘recommended’ or ‘for consideration’ by WFO.
Results: Treatment concordance between WFO and the multidisciplinary tumor board occurred in 93% of breast cancer cases.
Subgroup analysis found that patients with stage I or IV disease were less likely to be concordant than patients with stage II or III
disease. Increasing age was found to have a major impact on concordance. Concordance declined significantly (P 0.02;
P < 0.001) in all age groups compared with patients <45 years of age, except for the age group 55–64 years. Receptor status
was not found to affect concordance.
Conclusion: Treatment recommendations made by WFO and the tumor board were highly concordant for breast cancer cases
examined. Breast cancer stage and patient age had significant influence on concordance, while receptor status alone did not.
This study demonstrates that the AI clinical decision-support system WFO may be a helpful tool for breast cancer treatment
decision making, especially at centers where expert breast cancer resources are limited.
Key words: Watson for Oncology, artiﬁcial intelligence, cognitive clinical decision-support systems, breast cancer,
concordance, multidisciplinary tumor board
Introduction
Oncologists who treat breast cancer are challenged by a large and
rapidly expanding knowledge base [1, 2]. As of October 2017, for
example, there were 69 FDA-approved drugs for the treatment of
breast cancer, not including combination treatment regimens
[3]. The growth of massive genetic and clinical databases, along
with computing systems to exploit them, will accelerate the speed
of breast cancer treatment advances and shorten the cycle time
for changes to breast cancer treatment guidelines [4, 5]. In add-
ition, these information management challenges in cancer care
are occurring in a practice environment where there is little time
available for tracking and accessing relevant information at the
point of care [6]. For example, a study that surveyed 1117 oncolo-
gists reported that on average 4.6 h per week were spent keeping
VC The Author(s) 2018. Published by Oxford University Press on behalf of the European Society for Medical Oncology.
All rights reserved. For permissions, please email: journals.permissions@oup.com.
Annals of Oncology 29: 418–423, 2018
doi:10.1093/annonc/mdx781
Published online 9 January 2018
Downloaded from https://academic.oup.com/annonc/article-abstract/29/2/418/4781689
by guest

잠정적 결론
•왓슨 포 온콜로지와 의사의 일치율:

•암종별로 다르다.

•같은 암종에서도 병기별로 다르다.

•같은 암종에 대해서도 병원별/국가별로 다르다.

•시간이 흐름에 따라 달라질 가능성이 있다.

원칙이 필요하다
•어떤 환자의 경우, 왓슨에게 의견을 물을 것인가?

•왓슨을 (암종별로) 얼마나 신뢰할 것인가?

•왓슨의 의견을 환자에게 공개할 것인가?

•왓슨과 의료진의 판단이 다른 경우 어떻게 할 것인가?

•왓슨에게 보험 급여를 매길 수 있는가?
이러한 기준에 따라 의료의 질/치료효과가 달라질 수 있으나,

현재 개별 병원이 개별적인 기준으로 활용하게 됨

Bone Age Assessment
• M: 28 Classes
• F: 20 Classes
• Method: G.P.
• Top3-95.28% (F)
• Top3-81.55% (M)

40
50
60
70
80
인공지능 의사 A 의사 B
69.5%
63%
49.5%
정확도(%)
영상의학과 펠로우

(소아영상 세부전공)
영상의학과

2년차 전공의
인공지능 vs 의사
AJR Am J Roentgenol. 2017 Dec;209(6):1374-1380.
• 총 환자의 수: 200명

• 의사A: 소아영상 세부전공한 영상의학 전문의 (500례 이상의 판독 경험)

• 의사B: 영상의학과 2년차 전공의 (판독법 하루 교육 이수 + 20례 판독)

• 레퍼런스: 경험 많은 소아영상의학과 전문의 2명(18년, 4년 경력)의 컨센서스

• 인공지능: VUNO의 골연령 판독 딥러닝
골연령 판독에 인간 의사와 인공지능의 시너지 효과
Director,Yoon Sup Choi, PhD

40
50
60
70
80
인공지능 의사 A 의사 B
40
50
60
70
80
의사 A  
+ 인공지능
의사 B  
+ 인공지능
69.5%
63%
49.5%
72.5%
57.5%
정확도(%)
영상의학과 펠로우

(소아영상 세부전공)
영상의학과

2년차 전공의
인공지능 vs 의사 인공지능 + 의사




골연령 판독에 인간 의사와 인공지능의 시너지 효과

총 판독 시간 (m)
0
50
100
150
200
w/o AI w/ AI
0
50
100
150
200
w/o AI w/ AI
188m
154m
180m
108m
saving 40%
of time
saving 18%
of time
의사 A 의사 B
골연령 판독에서 인공지능을 활용하면

판독 시간의 절감도 가능





Detection of Diabetic Retinopathy

Copyright 2016 American Medical Association. All rights reserved.
Development and Validation of a Deep Learning Algorithm
for Detection of Diabetic Retinopathy
in Retinal Fundus Photographs
Varun Gulshan, PhD; Lily Peng, MD, PhD; Marc Coram, PhD; Martin C. Stumpe, PhD; Derek Wu, BS; Arunachalam Narayanaswamy, PhD;
Subhashini Venugopalan, MS; Kasumi Widner, MS; Tom Madams, MEng; Jorge Cuadros, OD, PhD; Ramasamy Kim, OD, DNB;
Rajiv Raman, MS, DNB; Philip C. Nelson, BS; Jessica L. Mega, MD, MPH; Dale R. Webster, PhD
IMPORTANCE Deep learning is a family of computational methods that allow an algorithm to
program itself by learning from a large set of examples that demonstrate the desired
behavior, removing the need to specify rules explicitly. Application of these methods to
medical imaging requires further assessment and validation.
OBJECTIVE To apply deep learning to create an algorithm for automated detection of diabetic
retinopathy and diabetic macular edema in retinal fundus photographs.
DESIGN AND SETTING A specific type of neural network optimized for image classification
called a deep convolutional neural network was trained using a retrospective development
data set of 128 175 retinal images, which were graded 3 to 7 times for diabetic retinopathy,
diabetic macular edema, and image gradability by a panel of 54 US licensed ophthalmologists
and ophthalmology senior residents between May and December 2015. The resultant
algorithm was validated in January and February 2016 using 2 separate data sets, both
graded by at least 7 US board-certified ophthalmologists with high intragrader consistency.
EXPOSURE Deep learning–trained algorithm.
MAIN OUTCOMES AND MEASURES The sensitivity and specificity of the algorithm for detecting
referable diabetic retinopathy (RDR), defined as moderate and worse diabetic retinopathy,
referable diabetic macular edema, or both, were generated based on the reference standard
of the majority decision of the ophthalmologist panel. The algorithm was evaluated at 2
operating points selected from the development set, one selected for high specificity and
another for high sensitivity.
RESULTS TheEyePACS-1datasetconsistedof9963imagesfrom4997patients(meanage,54.4
years;62.2%women;prevalenceofRDR,683/8878fullygradableimages[7.8%]);the
Messidor-2datasethad1748imagesfrom874patients(meanage,57.6years;42.6%women;
prevalenceofRDR,254/1745fullygradableimages[14.6%]).FordetectingRDR,thealgorithm
hadanareaunderthereceiveroperatingcurveof0.991(95%CI,0.988-0.993)forEyePACS-1and
0.990(95%CI,0.986-0.995)forMessidor-2.Usingthefirstoperatingcutpointwithhigh
specificity,forEyePACS-1,thesensitivitywas90.3%(95%CI,87.5%-92.7%)andthespecificity
was98.1%(95%CI,97.8%-98.5%).ForMessidor-2,thesensitivitywas87.0%(95%CI,81.1%-
91.0%)andthespecificitywas98.5%(95%CI,97.7%-99.1%).Usingasecondoperatingpoint
withhighsensitivityinthedevelopmentset,forEyePACS-1thesensitivitywas97.5%and
specificitywas93.4%andforMessidor-2thesensitivitywas96.1%andspecificitywas93.9%.
CONCLUSIONS AND RELEVANCE In this evaluation of retinal fundus photographs from adults
with diabetes, an algorithm based on deep machine learning had high sensitivity and
specificity for detecting referable diabetic retinopathy. Further research is necessary to
determine the feasibility of applying this algorithm in the clinical setting and to determine
whether use of the algorithm could lead to improved care and outcomes compared with
current ophthalmologic assessment.
JAMA. doi:10.1001/jama.2016.17216
Published online November 29, 2016.
Editorial
Supplemental content
Author Affiliations: Google Inc,
Mountain View, California (Gulshan,
Peng, Coram, Stumpe, Wu,
Narayanaswamy, Venugopalan,
Widner, Madams, Nelson, Webster);
Department of Computer Science,
University of Texas, Austin
(Venugopalan); EyePACS LLC,
San Jose, California (Cuadros); School
of Optometry, Vision Science
Graduate Group, University of
California, Berkeley (Cuadros);
Aravind Medical Research
Foundation, Aravind Eye Care
System, Madurai, India (Kim); Shri
Bhagwan Mahavir Vitreoretinal
Services, Sankara Nethralaya,
Chennai, Tamil Nadu, India (Raman);
Verily Life Sciences, Mountain View,
California (Mega); Cardiovascular
Division, Department of Medicine,
Brigham and Women’s Hospital and
Harvard Medical School, Boston,
Massachusetts (Mega).
Corresponding Author: Lily Peng,
MD, PhD, Google Research, 1600
Amphitheatre Way, Mountain View,
CA 94043 (lhpeng@google.com).
Research
JAMA | Original Investigation | INNOVATIONS IN HEALTH CARE DELIVERY
(Reprinted) E1
Copyright 2016 American Medical Association. All rights reserved.

Training Set / Test Set
• CNN으로 후향적으로 128,175개의 안저 이미지 학습

• 미국의 안과전문의 54명이 3-7회 판독한 데이터

• 우수한 안과전문의들 7-8명의 판독 결과와 인공지능의 판독 결과 비교

• EyePACS-1 (9,963 개), Messidor-2 (1,748 개)a) Fullscreen mode
b) Hit reset to reload this image. This will reset all of the grading.
c) Comment box for other pathologies you see
eFigure 2. Screenshot of the Second Screen of the Grading Tool, Which Asks Graders to Assess the
Image for DR, DME and Other Notable Conditions or Findings

• EyePACS-1 과 Messidor-2 의 AUC = 0.991, 0.990

• 7-8명의 안과 전문의와 sensitivity, specificity 가 동일한 수준

• F-score: 0.95 (vs. 인간 의사는 0.91)
Additional sensitivity analyses were conducted for sev-
eralsubcategories:(1)detectingmoderateorworsediabeticreti-
effects of data set size on algorithm performance were exam-
ined and shown to plateau at around 60 000 images (or ap-
Figure 2. Validation Set Performance for Referable Diabetic Retinopathy
100
80
60
40
20
0
0
70
80
85
95
90
75
0 5 10 15 20 25 30
100806040
Sensitivity,%
1 – Specificity, %
20
EyePACS-1: AUC, 99.1%; 95% CI, 98.8%-99.3%A
100
High-sensitivity operating point
High-specificity operating point
100
80
60
40
20
0
0
70
80
85
95
90
75
0 5 10 15 20 25 30
100806040
Sensitivity,%
1 – Specificity, %
20
Messidor-2: AUC, 99.0%; 95% CI, 98.6%-99.5%B
100
High-specificity operating point
High-sensitivity operating point
Performance of the algorithm (black curve) and ophthalmologists (colored
circles) for the presence of referable diabetic retinopathy (moderate or worse
diabetic retinopathy or referable diabetic macular edema) on A, EyePACS-1
(8788 fully gradable images) and B, Messidor-2 (1745 fully gradable images).
The black diamonds on the graph correspond to the sensitivity and specificity of
the algorithm at the high-sensitivity and high-specificity operating points.
In A, for the high-sensitivity operating point, specificity was 93.4% (95% CI,
92.8%-94.0%) and sensitivity was 97.5% (95% CI, 95.8%-98.7%); for the
high-specificity operating point, specificity was 98.1% (95% CI, 97.8%-98.5%)
and sensitivity was 90.3% (95% CI, 87.5%-92.7%). In B, for the high-sensitivity
operating point, specificity was 93.9% (95% CI, 92.4%-95.3%) and sensitivity
was 96.1% (95% CI, 92.4%-98.3%); for the high-specificity operating point,
specificity was 98.5% (95% CI, 97.7%-99.1%) and sensitivity was 87.0% (95%
CI, 81.1%-91.0%). There were 8 ophthalmologists who graded EyePACS-1 and 7
ophthalmologists who graded Messidor-2. AUC indicates area under the
receiver operating characteristic curve.
Research Original Investigation Accuracy of a Deep Learning Algorithm for Detection of Diabetic Retinopathy
Results

•2018년 4월 FDA는 안저사진을 판독하여 당뇨성 망막병증(DR)을 진단하는 인공지능 시판 허가

•IDx-DR: 클라우드 기반의 소프트웨어로, Topcon NW400 로 찍은 사진을 판독

•의사의 개입 없이 안저 사진을 판독하여 DR 여부를 진단

•두 가지 답 중에 하나를 준다

•1) mild DR 이상이 detection 되었으니, 의사에게 가봐라

•2) mild DR 이상은 없는 것 같으니, 12개월 이후에 다시 검사 받아봐라 
•임상시험 및 성능

•10개의 병원에서 멀티센터로 900명 환자의 데이터를 분석

•민감도와 특이도가 각각 87.4%, 89.5% (JAMA 논문의 구글 인공지능 보다 낮음)

•FDA가 de novo premarket review pathway로 진행

0 0 M O N T H 2 0 1 7 | V O L 0 0 0 | N A T U R E | 1
LETTER doi:10.1038/nature21056
Dermatologist-level classification of skin cancer
with deep neural networks
Andre Esteva1
*, Brett Kuprel1
*, Roberto A. Novoa2,3
, Justin Ko2
, Susan M. Swetter2,4
, Helen M. Blau5
& Sebastian Thrun6
Skin cancer, the most common human malignancy1–3
, is primarily
diagnosed visually, beginning with an initial clinical screening
and followed potentially by dermoscopic analysis, a biopsy and
histopathological examination. Automated classification of skin
lesions using images is a challenging task owing to the fine-grained
variability in the appearance of skin lesions. Deep convolutional
neural networks (CNNs)4,5
show potential for general and highly
variable tasks across many fine-grained object categories6–11
.
Here we demonstrate classification of skin lesions using a single
CNN, trained end-to-end from images directly, using only pixels
and disease labels as inputs. We train a CNN using a dataset of
129,450 clinical images—two orders of magnitude larger than
previous datasets12
—consisting of 2,032 different diseases. We
test its performance against 21 board-certified dermatologists on
biopsy-proven clinical images with two critical binary classification
use cases: keratinocyte carcinomas versus benign seborrheic
keratoses; and malignant melanomas versus benign nevi. The first
case represents the identification of the most common cancers, the
second represents the identification of the deadliest skin cancer.
The CNN achieves performance on par with all tested experts
across both tasks, demonstrating an artificial intelligence capable
of classifying skin cancer with a level of competence comparable to
dermatologists. Outfitted with deep neural networks, mobile devices
can potentially extend the reach of dermatologists outside of the
clinic. It is projected that 6.3 billion smartphone subscriptions will
exist by the year 2021 (ref. 13) and can therefore potentially provide
low-cost universal access to vital diagnostic care.
There are 5.4 million new cases of skin cancer in the United States2
every year. One in five Americans will be diagnosed with a cutaneous
malignancy in their lifetime. Although melanomas represent fewer than
5% of all skin cancers in the United States, they account for approxi-
mately 75% of all skin-cancer-related deaths, and are responsible for
over 10,000 deaths annually in the United States alone. Early detection
is critical, as the estimated 5-year survival rate for melanoma drops
from over 99% if detected in its earliest stages to about 14% if detected
in its latest stages. We developed a computational method which may
allow medical practitioners and patients to proactively track skin
lesions and detect cancer earlier. By creating a novel disease taxonomy,
and a disease-partitioning algorithm that maps individual diseases into
training classes, we are able to build a deep learning system for auto-
mated dermatology.
Previous work in dermatological computer-aided classification12,14,15
has lacked the generalization capability of medical practitioners
owing to insufficient data and a focus on standardized tasks such as
dermoscopy16–18
and histological image classification19–22
. Dermoscopy
images are acquired via a specialized instrument and histological
images are acquired via invasive biopsy and microscopy; whereby
both modalities yield highly standardized images. Photographic
images (for example, smartphone images) exhibit variability in factors
such as zoom, angle and lighting, making classification substantially
more challenging23,24
. We overcome this challenge by using a data-
driven approach—1.41 million pre-training and training images
make classification robust to photographic variability. Many previous
techniques require extensive preprocessing, lesion segmentation and
extraction of domain-specific visual features before classification. By
contrast, our system requires no hand-crafted features; it is trained
end-to-end directly from image labels and raw pixels, with a single
network for both photographic and dermoscopic images. The existing
body of work uses small datasets of typically less than a thousand
images of skin lesions16,18,19
, which, as a result, do not generalize well
to new images. We demonstrate generalizable classification with a new
dermatologist-labelled dataset of 129,450 clinical images, including
3,374 dermoscopy images.
Deep learning algorithms, powered by advances in computation
and very large datasets25
, have recently been shown to exceed human
performance in visual tasks such as playing Atari games26
, strategic
board games like Go27
and object recognition6
. In this paper we
outline the development of a CNN that matches the performance of
dermatologists at three key diagnostic tasks: melanoma classification,
melanoma classification using dermoscopy and carcinoma
classification. We restrict the comparisons to image-based classification.
We utilize a GoogleNet Inception v3 CNN architecture9
that was pre-
trained on approximately 1.28 million images (1,000 object categories)
from the 2014 ImageNet Large Scale Visual Recognition Challenge6
,
and train it on our dataset using transfer learning28
. Figure 1 shows the
working system. The CNN is trained using 757 disease classes. Our
dataset is composed of dermatologist-labelled images organized in a
tree-structured taxonomy of 2,032 diseases, in which the individual
diseases form the leaf nodes. The images come from 18 different
clinician-curated, open-access online repositories, as well as from
clinical data from Stanford University Medical Center. Figure 2a shows
a subset of the full taxonomy, which has been organized clinically and
visually by medical experts. We split our dataset into 127,463 training
and validation images and 1,942 biopsy-labelled test images.
To take advantage of fine-grained information contained within the
taxonomy structure, we develop an algorithm (Extended Data Table 1)
to partition diseases into fine-grained training classes (for example,
amelanotic melanoma and acrolentiginous melanoma). During
inference, the CNN outputs a probability distribution over these fine
classes. To recover the probabilities for coarser-level classes of interest
(for example, melanoma) we sum the probabilities of their descendants
(see Methods and Extended Data Fig. 1 for more details).
We validate the effectiveness of the algorithm in two ways, using
nine-fold cross-validation. First, we validate the algorithm using a
three-class disease partition—the first-level nodes of the taxonomy,
which represent benign lesions, malignant lesions and non-neoplastic
1
Department of Electrical Engineering, Stanford University, Stanford, California, USA. 2
Department of Dermatology, Stanford University, Stanford, California, USA. 3
Department of Pathology,
Stanford University, Stanford, California, USA. 4
Dermatology Service, Veterans Affairs Palo Alto Health Care System, Palo Alto, California, USA. 5
Baxter Laboratory for Stem Cell Biology, Department
of Microbiology and Immunology, Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, California, USA. 6
Department of Computer Science, Stanford University,
Stanford, California, USA.
*These authors contributed equally to this work.
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

LETTERH
his task, the CNN achieves 72.1±0.9% (mean±s.d.) overall
he average of individual inference class accuracies) and two
gists attain 65.56% and 66.0% accuracy on a subset of the
set. Second, we validate the algorithm using a nine-class
rtition—the second-level nodes—so that the diseases of
have similar medical treatment plans. The CNN achieves
two trials, one using standard images and the other using
images, which reflect the two steps that a dermatologist m
to obtain a clinical impression. The same CNN is used for a
Figure 2b shows a few example images, demonstrating th
distinguishing between malignant and benign lesions, whic
visual features. Our comparison metrics are sensitivity an
Acral-lentiginous melanoma
Amelanotic melanoma
Lentigo melanoma
…
Blue nevus
Halo nevus
Mongolian spot
…
Training classes (757)Deep convolutional neural network (Inception v3) Inference classes (varies by task)
92% malignant melanocytic lesion
8% benign melanocytic lesion
Skin lesion image
Convolution
AvgPool
MaxPool
Concat
Dropout
Fully connected
Softmax
Deep CNN layout. Our classification technique is a
Data flow is from left to right: an image of a skin lesion
e, melanoma) is sequentially warped into a probability
over clinical classes of skin disease using Google Inception
hitecture pretrained on the ImageNet dataset (1.28 million
1,000 generic object classes) and fine-tuned on our own
29,450 skin lesions comprising 2,032 different diseases.
ning classes are defined using a novel taxonomy of skin disease
oning algorithm that maps diseases into training classes
(for example, acrolentiginous melanoma, amelanotic melano
melanoma). Inference classes are more general and are comp
or more training classes (for example, malignant melanocytic
class of melanomas). The probability of an inference class is c
summing the probabilities of the training classes according to
structure (see Methods). Inception v3 CNN architecture repr
from https://research.googleblog.com/2016/03/train-your-ow
classifier-with.html
GoogleNet Inception v3
• 129,450개의 피부과 병변 이미지 데이터를 자체 제작

• 미국의 피부과 전문의 18명이 데이터 curation

• CNN (Inception v3)으로 이미지를 학습

• 피부과 전문의들 21명과 인공지능의 판독 결과 비교

• 표피세포 암 (keratinocyte carcinoma)과 지루각화증(benign seborrheic keratosis)의 구분

• 악성 흑색종과 양성 병변 구분 (표준 이미지 데이터 기반)

• 악성 흑색종과 양성 병변 구분 (더마토스코프로 찍은 이미지 기반)

Skin cancer classiﬁcation performance of
the CNN and dermatologists. LETT
a
b
0 1
Sensitivity
0
1
Specificity
Melanoma: 130 images
0 1
Sensitivity
0
1
Specificity
Melanoma: 225 images
Algorithm: AUC = 0.96
0 1
Sensitivity
0
1
Specificity
Melanoma: 111 dermoscopy images
0 1
Sensitivity
0
1
Specificity
Carcinoma: 707 images
0 1
Sensitivity
0
1
Specificity
Melanoma: 1,010 dermoscopy images
0 1
Sensitivity
0
1
Specificity
Carcinoma: 135 images
Dermatologists (25)
Average dermatologist
Dermatologists (22)
Dermatologists (21)
cancer classification performance of the CNN and
21명 중에 인공지능보다 정확성이 떨어지는 피부과 전문의들이 상당수 있었음

피부과 전문의들의 평균 성적도 인공지능보다 좋지 않았음

Assisting Pathologists in Detecting
Cancer with Deep Learning
• The localization score(FROC) for the algorithm reached 89%, which signiﬁcantly
exceeded the score of 73% for a pathologist with no time constraint.
https://research.googleblog.com/2017/03/assisting-pathologists-in-detecting.html

• Algorithms need to be incorporated in a way that complements the pathologist’s workﬂow.
• Algorithms could improve the efﬁciency and consistency of pathologists.
• For example, pathologists could reduce their false negative rates (percentage of  
 
undetected tumors) by reviewing the top ranked predicted tumor regions  
 
including up to 8 false positive regions per slide.
https://research.googleblog.com/2017/03/assisting-pathologists-in-detecting.html

6
Input & Validation Test
model size FROC @8FP AUC FROC @8FP AUC
40X 98.1 100 99.0 87.3 (83.2, 91.1) 91.1 (87.2, 94.5) 96.7 (92.6, 99.6)
40X-pretrained 99.3 100 100 85.5 (81.0, 89.5) 91.1 (86.8, 94.6) 97.5 (93.8, 99.8)
40X-small 99.3 100 100 86.4 (82.2, 90.4) 92.4 (88.8, 95.7) 97.1 (93.2, 99.8)
ensemble-of-3 - - - 88.5 (84.3, 92.2) 92.4 (88.7, 95.6) 97.7 (93.0, 100)
20X-small 94.7 100 99.6 85.5 (81.0, 89.7) 91.1 (86.9, 94.8) 98.6 (96.7, 100)
10X-small 88.7 97.2 97.7 79.3 (74.2, 84.1) 84.9 (80.0, 89.4) 96.5 (91.9, 99.7)
40X+20X-small 94.9 98.6 99.0 85.9 (81.6, 89.9) 92.9 (89.3, 96.1) 97.0 (93.1, 99.9)
40X+10X-small 93.8 98.6 100 82.2 (77.0, 86.7) 87.6 (83.2, 91.7) 98.6 (96.2, 99.9)
Pathologist [1] - - - 73.3* 73.3* 96.6
Camelyon16 winner [1, 23] - - - 80.7 82.7 99.4
Table 1. Results on Camelyon16 dataset (95% confidence intervals, CI). Bold indicates
results within the CI of the best model. “Small” models contain 300K parameters per
Inception tower instead of 20M. -: not reported. *A pathologist achieved this sensitivity
(with no FP) using 30 hours.
to 10 20% variance), and can confound evaluation of model improvements
by grouping multiple nearby tumors as one. By contrast, our non-maxima sup-
pression approach is relatively insensitive to r between 4 and 6, although less
accurate models benefited from tuning r using the validation set (e.g., 8). Fi-
The FROC evaluates tumor detection and localization
The FROC is defined as the sensitivity at 0.25,0.5,1,2,4,8 average FPs per tumor-negative slide.
Yun Liu et al. Detecting Cancer Metastases on Gigapixel Pathology Images (2017)
Sensitivity at 8 false positives per image

Yun Liu et al. Detecting Cancer Metastases on Gigapixel Pathology Images (2017)
• 구글의 인공지능은 @8FP 및 FROC에서 큰 개선 (92.9%, 88.5%)

•@8FP: FP를 8개까지 봐주면서, 달성할 수 있는 sensitivity

•FROC: FP를 슬라이드당 1/4, 1/2, 1, 2, 4, 8개를 허용한 민감도의 평균

•즉, FP를 조금 봐준다면, 인공지능은 매우 높은 민감도를 달성 가능

• 인간 병리학자는 민감도 73%에 반해 특이도는 거의 100% 달성
•인간 병리학자와 인공지능 병리학자는 서로 잘하는 것이 다름

•양쪽이 협력하면 판독 효율성, 일관성, 민감도 등에서 개선 기대 가능

https://www.facebook.com/groups/TensorFlowKR/permalink/633902253617503/
구글 엔지니어들이 AACR 2018 에서

의료 인공지능 기조 연설

AACR 2018인공지능을 이용하면 총 판독 시간을 줄일 수 있다

AACR 2018인공지능을 이용하면 판독 정확도를 (micro에서 특히) 높일 수 수 있다

BeyondVerbal: Reading emotions from voices

http://www.wsj.com/articles/SB10001424052702303824204579421242295627138

BeyondVerbal
• 기계가 사람의 감정을 이해한다면?

• 헬스케어 분야에서도 응용도 높음: 슬픔/우울함/피로 등의 감정 파악

• 보험 회사에서는 가입자의 우울증 여부 파악을 위해 이미 사용 중

• Aetna 는 2012년 부터 고객의 우울증 여부를 전화 목소리 분석으로 파악

• 기존의 방식에 비해 우울증 환자 6배 파악

• 사생활 침해 여부 존재

• linguistic
• identiﬁcation and extraction of
word instances (unigrams) and
word-pair instances (bi-grams)
from the transcriptions
• acoustic
• vocal dynamics
• voice quality
• vocal tract resonance frequencies
• pause lengths
A Machine Learning Approach to Identifying the
Thought Markers of Suicidal Subjects:A
Prospective Multicenter Trial
• “Do you have hope?”
• “Do you have any fear?”
• “Do you have any secrets?”
• “Are you angry?”
• “Does it hurt emotionally?”
Pestian, Suicide and Life-Threatening Behavior, 2016

A Machine Learning Approach to Identifying the
Thought Markers of Suicidal Subjects:A
Prospective Multicenter Trial
SensitivitySensitivity
1.00.00.20.40.60.81.00.00.20.40.60.81.0
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
SUICIDE THOUGHT MARKERS
SensitivitySensitivitySensitivity
0.00.20.40.60.81.00.00.20.40.60.81.00.00.20.40.60.81.0
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
Figure 1. Receiver operator curve (ROC): suicide versus control (upper), suicide versus mentally ill (middle), and
SensitivitySensitivity
0.00.20.40.60.81.00.00.20.40.60.81.0
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
1.0 0.8 0.6 0.4 0.2 0.0
Specificity
Figure 1. Receiver operator curve (ROC): suicide versus control (upper), suicide versus mentally
suicide versus mentally ill with control. The ROC curves for adolescents (blue), adults (red), and a
generated where the nonsuicidal population is controls (top), mentally ill (middle), and mentally
using linguistic and acoustic features. The gray line is the AROC curve for a baseline (random) cla
TABLE 2
The AROC for the Machine Learning Algorithm. The Nonsuicidal Group Comprises of Either Mentally Ill and Control Subjects. Classiﬁcation
Performances are Shown for Adolescents, Adults, and the Combined Adolescent and Adult Cohorts
Suicidal versus Controls Suicidal versus Mentally Ill Suicidal versus Mentally Ill and Controls
Adolescents
ROC (SD)
Adults
ROC (SD)
Adolescents +
Adults
ROC (SD)
Adolescents
ROC (SD)
Adults
ROC (SD)
Adolescents +
Adults
ROC (SD)
Adolescents
ROC (SD)
Adults
ROC (SD)
Adolescents +
Adults
ROC (SD)
Linguistics 0.87 (0.04) 0.91 (0.02) 0.93 (0.02) 0.82 (0.05) 0.77 (0.04) 0.79 (0.03) 0.82 (0.04) 0.84 (0.03) 0.87 (0.02)
Acoustics 0.74 (0.05) 0.82 (0.03) 0.79 (0.03) 0.69 (0.06) 0.74 (0.04) 0.76 (0.03) 0.74 (0.05) 0.80 (0.03) 0.76 (0.03)
Linguistics +
Acoustics
0.83 (0.05) 0.93 (0.02) 0.92 (0.02) 0.80 (0.05) 0.77 (0.04) 0.82 (0.03) 0.81 (0.04) 0.84 (0.03) 0.87 (0.02)
PESTIANETAL.
Suicidal vs. Control Suicidal vs. Mentally Ill Suicidal vs. Mentally Ill and Controls
adolescents
adults
Pestian, Suicide and Life-Threatening Behavior, 2016

“A lot of Syrian refugees have trauma and maybe
this can help them overcome that.” However, he
points out that there is a stigma around
psychotherapy, saying people feel shame about
seeking out psychologists.
As a result he thinks people might feel more
comfortable knowing they are talking to a “robot”
than to a human.

http://www.newyorker.com/tech/elements/the-chatbot-will-see-you-now
•시리아 난민의 3/4 정도가 불안감, 고립감, 불면증 등의 증세

•아랍어를 하는 수천명의 상담사를 구하기는 불가능

•실리콘밸리 스타트업 X2AI 의 챗봇, Karim

•emotion-recognition algorithm

•표현, 용어 사용, 타이핑 속도, 문장의 길이, 문법 (수동태,
능동태) 등의 파라미터를 분석

•분노, 슬픔을 표현하는 것을 금기시하는 시리아에서,  
오히려 환자들이 챗봇과 상담하는 것을 편하게 느낌

•스탠퍼드 심리학과 David Spiegel 교수

•진단에 관한 역량은 유망할 것; 모든 상담 내용 기억

•케어받는 느낌을 주거나, 전이감정 다루기는 어려울 것

• Woebot, 정신 상담 챗봇 스타트업

• 스탠퍼드의 mental health 전문가들이 시작한 우울증 치료 (인지행동치료) 목적의 챗봇

• Andrew Ng 교수는 이사회장으로 참여

• Woebot, 정신 상담 챗봇

• 실제 상담사들이 하듯이, 대화형으로 설명하고 사용자의 정신 건강 상태를 체크

• 대부분 설문과 다를 것이 없지만 (정해진 답 중에 하나 선택), UI 상의 혁신이라고 볼 수 있음

• 아직까지는 아주 정교한 NLP를 사용하고 있지는 않음 (세션 당 한 번 정도)

Original Paper
Delivering Cognitive Behavior Therapy to Young Adults With
Symptoms of Depression and Anxiety Using a Fully Automated
Conversational Agent (Woebot): A Randomized Controlled Trial
Kathleen Kara Fitzpatrick1*
, PhD; Alison Darcy2*
, PhD; Molly Vierhile1
, BA
1
Stanford School of Medicine, Department of Psychiatry and Behavioral Sciences, Stanford, CA, United States
2
Woebot Labs Inc., San Francisco, CA, United States
*
these authors contributed equally
Corresponding Author:
Alison Darcy, PhD
Woebot Labs Inc.
55 Fair Avenue
San Francisco, CA, 94110
United States
Email: alison@woebot.io
Abstract
Background: Web-based cognitive-behavioral therapeutic (CBT) apps have demonstrated efficacy but are characterized by
poor adherence. Conversational agents may offer a convenient, engaging way of getting support at any time.
Objective: The objective of the study was to determine the feasibility, acceptability, and preliminary efficacy of a fully automated
conversational agent to deliver a self-help program for college students who self-identify as having symptoms of anxiety and
depression.
Methods: In an unblinded trial, 70 individuals age 18-28 years were recruited online from a university community social media
site and were randomized to receive either 2 weeks (up to 20 sessions) of self-help content derived from CBT principles in a
conversational format with a text-based conversational agent (Woebot) (n=34) or were directed to the National Institute of Mental
Health ebook, “Depression in College Students,” as an information-only control group (n=36). All participants completed
Web-based versions of the 9-item Patient Health Questionnaire (PHQ-9), the 7-item Generalized Anxiety Disorder scale (GAD-7),
and the Positive and Negative Affect Scale at baseline and 2-3 weeks later (T2).
Results: Participants were on average 22.2 years old (SD 2.33), 67% female (47/70), mostly non-Hispanic (93%, 54/58), and
Caucasian (79%, 46/58). Participants in the Woebot group engaged with the conversational agent an average of 12.14 (SD 2.23)
times over the study period. No significant differences existed between the groups at baseline, and 83% (58/70) of participants
provided data at T2 (17% attrition). Intent-to-treat univariate analysis of covariance revealed a significant group difference on
depression such that those in the Woebot group significantly reduced their symptoms of depression over the study period as
measured by the PHQ-9 (F=6.47; P=.01) while those in the information control group did not. In an analysis of completers,
participants in both groups significantly reduced anxiety as measured by the GAD-7 (F1,54= 9.24; P=.004). Participants’ comments
suggest that process factors were more influential on their acceptability of the program than content factors mirroring traditional
therapy.
Conclusions: Conversational agents appear to be a feasible, engaging, and effective way to deliver CBT.
(JMIR Ment Health 2017;4(2):e19) doi:10.2196/mental.7785
KEYWORDS
conversational agents; mobile mental health; mental health; chatbots; depression; anxiety; college students; digital health
Introduction
Up to 74% of mental health diagnoses have their first onset
particularly common among college students, with more than
half reporting symptoms of anxiety and depression in the
previous year that were so severe they had difficulty functioning
Fitzpatrick et alJMIR MENTAL HEALTH

depression at baseline as measured by the PHQ-9, while
three-quarters (74%, 52/70) were in the severe range for anxiety
as measured by the GAD-7.
Figure 1. Participant recruitment flow.
Table 1. Demographic and clinical variables of participants at baseline.
WoebotInformation control
Scale, mean (SD)
14.30 (6.65)13.25 (5.17)Depression (PHQ-9)
18.05 (5.89)19.02 (4.27)Anxiety (GAD-7)
25.54 (9.58)26.19 (8.37)Positive affect
24.87 (8.13)28.74 (8.92)Negative affect
22.58 (2.38)21.83 (2.24)Age, mean (SD)
Gender, n (%)
7 (21)4 (7)Male
27 (79)20 (55)Female
Ethnicity, n (%)
2 (6)2 (8)Latino/Hispanic
32 (94)22 (92)Non-Latino/Hispanic
28 (82)18 (75)Caucasian
Delivering Cognitive Behavior Therapy toYoung Adults With
Conversational Agent (Woebot):A Randomized Controlled Trial
•분노장애와 우울증이 있다고 스스로 생각하는 대학생들이 사용하는 self-help 챗봇

•목적: 챗봇의 feasibility, acceptability, preliminary efficacy 를 보기 위함

•대학생 총 70명을 대상으로 2주 동안 진행

•실험군 (Woebot): 34명

•대조군 (information-only): 31명

•Oucome: PHQ-9, GAD-7

d cPFWoebotInformation-only control
95% CIb
T2a
95% CIb
T2a
0.44.0176.039.74-12.3211.14 (0.71)12.07-15.2713.67 (.81)PHQ-9
0.14.5810.3816.16-18.1317.35 (0.60)15.52-18.5616.84 (.67)GAD-7
0.02.7070.1724.35-29.4126.88 (1.29)23.17-28.8626.02 (1.45)PANAS positive
affect
0.344.9120.9123.54-28.4225.98 (1.24)24.73-30.3227.53 (1.42)PANAS nega-
tive affect
a
Baseline=pooled mean (standard error)
b
95% confidence interval.
c
Cohen d shown for between-subjects effects using means and standard errors at Time 2.
Figure 2. Change in mean depression (PHQ-9) score by group over the study period. Error bars represent standard error.
Preliminary Efficacy
Table 2 shows the results of the primary ITT analyses conducted
on the entire sample. Univariate ANCOVA revealed a significant
treatment effect on depression revealing that those in the Woebot
group significantly reduced PHQ-9 score while those in the
information control group did not (F1,48=6.03; P=.017) (see
Figure 2). This represented a moderate between-groups effect
size (d=0.44). This effect is robust after Bonferroni correction
for multiple comparisons (P=.04). No other significant
between-group differences were observed on anxiety or affect.
Completer Analysis
As a secondary analysis, to explore whether any main effects
existed, 2x2 repeated measures ANOVAs were conducted on
the primary outcome variables (with the exception of PHQ-9)
among completers only. A significant main effect was observed
on GAD-7 (F1,54=9.24; P=.004) suggesting that completers
experienced a significant reduction in symptoms of anxiety
between baseline and T2, regardless of the group to which they
were assigned with a within-subjects effect size of d=0.37. No
main effects were observed for positive (F1,50=.001; P=.951;
d=0.21) or negative affect (F1,50=.06; P=.80; d=0.003) as
measured by the PANAS.
To further elucidate the source and magnitude of change in
depression, repeated measures dependent t tests were conducted
and Cohen d effect sizes were calculated on individual items of
the PHQ-9 among those in the Woebot condition. The analysis
revealed that baseline-T2 changes were observed on the
following items in order of decreasing magnitude: motoric
symptoms (d=2.09), appetite (d=0.65), little interest or pleasure
in things (d=0.44), feeling bad about self (d=0.40), and
concentration (d=0.39), and suicidal thoughts (d=0.30), feeling
down (d=0.14), sleep (d=0.12), and energy (d=0.06).
JMIR Ment Health 2017 | vol. 4 | iss. 2 | e19 | p.6http://mental.jmir.org/2017/2/e19/
(page number not for citation purposes)
XSL•FO
RenderX
Change in mean depression (PHQ-9) score
by group over the study period
•결과

•챗봇을 2주 동안 평균 12.14번 사용함

•우울증에 대해서는 significant group difference

•Woebot 그룹에서는 우울증(PHQ-9)의 유의미한 감소가 있었음

•대조군에서는 유의미한 감소 없음

•분노 장애에 대해서는 두 그룹 모두 유의미한 감소가 있었음 (GAD-7 기준)

•Woebot은 2017년 많은 데이터를 수집하면서 성장

•페북 메신져로만 서비스 하였음에도 불구하고, 매달 50% 이상 성장,

•주당 2m 개 이상의 메시지 축적, 130개국 이상의 사용자 확보

•2018년 초 iOS 앱 및 안드로이드 앱 출시  
•2018년 3월 Woebot 이 $8m의 시리즈A 투자에 성공

•New Enterprise Associates (NEA)가 리드하고 Andrew Ng 교수의 AI Fund 도 참여

•AI Fund: Andrew Ng 교수가 올해 1월 결성한 AI 스타트업에 투자하는 $175M 규모 펀드

G.M. Lucas et al. / Computers in Human Behavior 37 (2014) 94–100
인간 의사와 인공지능 의사 중 누가 라뽀 형성을 더 잘 할까?

It’s only a computer:
Virtual humans increase willingness to disclose
인공지능이

상담한다고 믿음

(computer frame)
사람이 원격으로

상담한다고 믿음

(human frame)
실제로 인공지능이 상담

(AI)
실제로 사람이 상담

(Tele-operated)
Method
Frame

‘‘How close are you to your family?’’
‘‘Tell me about a situation that you wish you had handled differently.’’
‘‘Tell me about an event, or something that you wish you could erase from your memory.’’
‘‘Tell me about the hardest decision you’ve ever had to make.’’
‘‘Tell me about the last time you felt really happy.’’
‘‘What are you most proud of in your life?’’
‘‘What’s something you feel guilty about?’’
‘‘When was the last time you argued with someone and what was it about?’’

0
5
10
15
20
Computer frame Human frame
0
15.25
30.5
45.75
61
0
0.033
0.065
0.098
0.13
0
0.3
0.6
0.9
1.2
Fear of Self-disclosure Impression Management Sadness Displays Willingness to Disclosure

‘‘This is way better than talking to a person. I don’t really feel
comfortable talking about personal stuff to other people.’’
‘‘A human being would be judgmental. I shared a lot of
personal things, and it was because of that.’’

Digital Therapeutics
디지털 신약

•Digiceutical = digital + pharmaceutical

•"chemical 과 protein에 이어서 digital drug 이 세번째 종류의 신약이 될 것이다”

•digital drug 은 크게 두 가지 종류

•기존의 약을 아예 대체

•기존 약을 강화(augment)

• reSET® was evaluated in a clinical trial of 507 patients with SUD across 10 treatment centers nation-wide over 12 weeks.*
• In patients who were dependent on stimulants, marijuana, cocaine, or alcohol (n=395), 58.1% of patients receiving
reSET®* were abstinent in study weeks 9-12, while 29.8% of patients receiving face-to-face therapy alone were abstinent
during the same time frame (p<0.01).
• Participants who tested positive for drug use at the start of the study (n=191), 26.7% of patients receiving reSET®* were
abstinent in study weeks 9-12, while 3.2% of patients receiving traditional face-to-face therapy were abstinent during the
same time frame (p<0.01).
Pear Therapeutics
Campbell et al. Am J Psychiatry. 2014.

Campbell et al. Am J Psychiatry. 2014.
Pear Therapeutics
• Patients receiving reSET® showed statistically signiﬁcant improvement in retention compared to face-to-face therapy alone
(p=0.0316).At the end of 12 weeks of treatment 59% of patients receiving face-to-face therapy were retained in the study
compared to 67% of patients receiving reSET®.

Pear Therapeutics
•최초로 스마트폰 앱이 digital therapeutics 로 질병 치료 목적으로 FDA de novo clearance 
 
(기기 없이 '앱'만으로 구성된 시스템이 '질병 치료' 목적으로 허가 받은 것은 최초)

•Pear Therapeutics의 reSET 이라는 시스템으로 각종 중독을 치료하는 목적의 앱

•12주에 걸쳐서 대마, 코카인, 알콜 중독에 대한 중독과 의존성을 치료

14© 2017 by HURAYPOSITIVE INC., a Digital Healthcare Service Provider. This information is strictly privileged and confidential. All rights reserved.
제2형 당뇨병 환자 95% 임신성 당뇨병 환자 2%
기타 1%
정상인 당뇨병 전단계
환자
당뇨병
환자
경증합병증 동반
당뇨병 환자
중증합병증 동반
당뇨병 환자
제1형 당뇨병 환자 2%
보건복지부/건강보험공단
(국민건강증진 및 관리)
병원/제약사/보험사
(비용절감 및 고객만족)
차기 위험단계로의
적극적인 진입 억제를 위한
헬스케어 솔루션
휴레이포지티브
헬스케어 솔루션
$
key facts
Products & Services
서비스 대상 & 역할

16© 2017 by HURAYPOSITIVE INC., a Digital Healthcare Service Provider. This information is strictly privileged and confidential. All rights reserved.
7
7.2
7.4
7.6
7.8
8
8.2
3M 6M 9M 12M0M
▼0.63%p.
▼0.64%p.
당화혈색소(HbA1c,%)
&
Products & Services
의학적 유효성(Health Switch를 활용한 임상실험)
기간
• 1차 실험(0M-6M)
실험군: 중재 O ( )
대조군: 중재 X ( )
• 2차 실험: 실험군과 대조군 교차(6M-12M)
대조군: 중재 X ( )
실험군: 중재 O ( )
당화혈색소 0.63%p. 감소
무의미한 변화
당화혈색소 수준 유지
당화혈색소 0.64%p. 감소
▼0.04%p.
• N = 148명
• 평균 연령: 52.2세
결과
임상 대상자
1 모바일 중재 서비스의 의미 있는 혈당 감소 효과
2 약 6개월의 서비스 후 생활습관 유지 가능성
3 고령 환자들도 사용할 수 있는 간편한 서비스
임상실험을 통해 검증된
Health Switch의 효과
key facts
• 특징: 제2형 당뇨병 유병자
• 기간: 2014.10 ~ 2015.12

1SCIENTIFIC REPORTS | (2018) 8:3642 | DOI:10.1038/s41598-018-22034-0
www.nature.com/scientificreports
The effectiveness, reproducibility,
and durability of tailored mobile
coaching on diabetes management
in policyholders:A randomized,
controlled, open-label study
DaYoung Lee1,2
, Jeongwoon Park3
, DooahChoi3
, Hong-YupAhn4
, Sung-Woo Park1
&
Cheol-Young Park 1
This randomized, controlled, open-label study conducted in Kangbuk Samsung Hospital evaluated
the effectiveness, reproducibility, and durability of tailored mobile coaching (TMC) on diabetes
management.The participants included 148 Korean adult policyholders with type 2 diabetes divided
into the Intervention-Maintenance (I-M) group (n=74) andControl-Intervention (C-I) group (n=74).
Intervention was the addition ofTMC to typical diabetes care. In the 6-month phase 1, the I-M group
receivedTMC, and theC-I group received their usual diabetes care. During the second 6-month phase
2, theC-I group receivedTMC, and the I-M group received only regular information messages.After
the 6-month phase 1, a significant decrease (0.6%) in HbA1c levels compared with baseline values was
observed in only the I-M group (from 8.1±1.4% to 7.5±1.1%, P<0.001 based on a paired t-test).
At the end of phase 2, HbA1c levels in theC-I group decreased by 0.6% compared with the value at 6
months (from 7.9±1.5 to 7.3±1.0, P<0.001 based on a paired t-test). In the I-M group, no changes
were observed. Both groups showed significant improvements in frequency of blood-glucose testing
and exercise. In conclusion, addition ofTMC to conventional treatment for diabetes improved glycemic
control, and this effect was maintained without individualized message feedback.
The incidence and prevalence of type 2 diabetes are increasing rapidly worldwide, and the disease is expected
to affect 439 million adults by 20301
. Previous large clinical trials indicated that adequate glycemic control con-
tributed to a reduction in both microvascular and macrovascular complications as well as mortality rates due to
diabetes2,3
. Complications from diabetes result in greater expenditure and reduced productivity. Therefore, it is a
socioeconomic concern4,5
. Adequate glycemic control is important not only as an individual health problem, but
also as a challenge to healthcare systems worldwide.
However, approximately 40% of subjects with diabetes in the United States do not meet the recommended
target for glycemic control, low-density lipoprotein cholesterol (LDL-C) level, or blood pressure (BP)6
. In Korea,
glycated hemoglobin (HbA1c) levels for nearly half of diabetic patients were above 7.0%7
.
Although successful diabetes care requires therapeutic lifestyle modification in addition to proper medica-
tion8–10
, only 55% of individuals with type 2 diabetes receive diabetes education from healthcare professionals11
,
and 16% report adhering to recommended self-management activities9
. Multifaceted professional inter-
ventions are needed to support patient efforts for behavior change including healthy lifestyle choices, disease
self-management, and prevention of diabetes complications10
.
1
Division of Endocrinology and Metabolism, Department of Internal Medicine, Kangbuk Samsung Hospital,
SungkyunkwanUniversitySchool of Medicine,Seoul, Republic of Korea.2
Division of Endocrinology and Metabolism,
Department of Internal Medicine, KoreaUniversityCollege of Medicine,Seoul, Republic of Korea.3
Huraypositive Inc.
Sinsa-dong, Gangnam-gu, Seoul, Republic of Korea. 4
Department of Statistics, Dongguk University-Seoul, Seoul,
Republic of Korea. Correspondence and requests for materials should be addressed to C.-Y.P. (email: cydoctor@
chol.com)
Received: 29 November 2017
Accepted: 15 February 2018
Published: xx xx xxxx
OPEN

DaYoung Lee1,2
, Jeongwoon Park3
, DooahChoi3
, Hong-YupAhn4
, Sung-Woo Park1
&
Cheol-Young Park 1
diabetes2,3
. In Korea,
.
tion8–10
,
.
1
Huraypositive Inc.
chol.com)
OPEN
e.com/scientificreports/
Figure 3. Changes in means and standard errors of glycated hemoglobin (H
study period.
HbA1c levels of the C-I group who received TMC during phase 2 of the study
decreased by 0.6% compared to phase 1 levels. In the I-M group, initial
improvement in HbA1c levels at 3 months continued until 12 months.
Consequently, HbA1c levels in both the C-I and I-M groups decreased
significantly compared to baseline values over the 12-month study period.

Prolonged Exposure Therapy

(지속 노출 치료)

지속 노출 치료의 한계
• 환자들이 트라우마를 떠올리는 것에 거부감을 느끼거나, 효과적으로 상상하지 못함

• 사실 그 자체가 PTSD 의 증상의 하나

• 환자가 트라우마에 대한 기억을 생생하게 시각화하지 못하면 치료 효과 감소
어떻게 환자에게 실감나는 상황을 시각화 해줄 것인가

VirtualVietnam
•VR은 PTSD의 치료를 위해 1990년대부터 활용

•최초의 시도: 버추얼 베트남 (1997)

• 정글을 헤치고 나가는 시나리오 / 군용 헬리곱터가 날아가는 시나리오

• 그래픽 수준, 구현 효과 및 시나리오 등이 제한적

• 전통적 심리 치료에 효과 없던 환자 전원이 유의미한 개선 효과
“영상 속에서 베트남 사람들과 탱크를 보았어요”

Virtual Iraq 의 다양한 시나리오
•시가지: 황량한 거리에 낡은 건물과 금방 무너질 것만 같은
아파트, 창고, 모스크, 공장 등이 있는 상황. 인적이나 교통
량이 거의 없는 버전과, 사람과 교통량이 많은 두 가지 버전

•시가지 빌딩 내부: 시가지의 일부 빌딩은 환자가 내부로 들
어가볼 수 있도록 내부 구조가 모델링. 빌딩은 비어있게 할
수도 있고, 적거나 많은 거주자가 내부에 있도록 설정 가능

•검문소: 시가지 시나리오의 일부로, 차량이 도시로 진입하
기 위해 정지하는 검문소 상황.

•작은 시골 마을: 쓰러져가는 건물과 전투의 잔해들이 있는
작은 마을을 재현. 주변에 식물들이 많고, 건물들 사이로 멀
리 사막이 보임

•사막 기지: 군인들, 텐트, 군용 장비 등이 설치 되어 있는 사
막의 기지를 재현.

•사막 도로: 비포장 도로의 환경. 각각 도시, 작은 시골 마을,
사막 기지 시나리오로 이어짐. 사막의 사구, 식물들, 낡은 건
물들, 전투 잔해, 길가의 사람 등으로 구성.
Fig. 1. Outskirts of Virtual Iraq City
Fig. 2. Center Area of Virtual Iraq City
Fig. 3. Car Bombing in Virtual Iraq City
User-Centered tests with the application were conducte
the Naval Medical CenteroSan Diego and within an Army
Combat Stress Control Team in Iraq (See Figure 8). This
d at
usability of the prototype system application that fed an
iterative design process. A clinical trial version of the
application built from this process is currently being tested
with PTSD-diagnosed personnel at a variety of sites. The
Fig. 4. Interior view from of Desert Road Humvee Scenario
Fig. 5. Turret view from of Desert Road Humvee Scenario
Fig. 6. IED Attack in Desert Road Humvee Scenario

오즈의 마법사:

시각-촉각-청각-후각을 통한 전쟁의 재현
• 상담사는 환자가 처해있는 모든 상황을 실시간으로 컨트롤 (‘오즈의 마법사’)

• 환자가 실제 트라우마를 가진 상황을 최대한 비슷하게 재현

• 시각적, 청각적, 후각적, 촉각적 상황을 컨트롤

• 다양한 군용 차량 / 근처에 있는 건물, 차, 탱크 등을 폭파

• 비행기나 헬리콥터를 머리 위에 출현, 낮/밤, 비/안개

• 다양한 상황을 재현 가능

• 총격전이 벌어지거나, 매복에 당한 상황, 로켓포가 날아오는 상황

• 동료가 죽거나 부상을 입은 상황, 사람의 시체나 잔해를 본 상황

• 적군이나 민간인에게 총격을 가한 상황 등등

scores at baseline, post treatment and 3-month follow-up are in Fig
group, mean Beck Anxiety Inventory scores significantly decrea
(9.5) to 11.9 (13.6), (t=3.37, df=19, p < .003) and mean PHQ-9
decreased 49% from 13.3 (5.4) to 7.1 (6.7), (t=3.68, df=19, p < 0.00
Figure 4. PTSD Checklist scores across treatment Figure 5. BAI and PH
The average number of sessions for this sample was just under
successful treatment completers had documented mild and mode
injuries, which suggest that this form of exposure can be useful
PTSD Checklist scores across treatment
• 연구 결과 20명의 환자들은 전반적으로 유의미한 개선을 보임

• 환자들 전체의 PCL-M 수치가 평균 54.4에서 35.6으로 감소

• 20명 중 16명은 치료 직후에 더 이상 PTSD 를 가지지 않은 것으로 나타남

• 치료가 끝난지 3개월 후에 환자들의 상태는 유지
http://www.ncbi.nlm.nih.gov/pubmed/19377167

reatment and 3-month follow-up are in Figure 4. For this same
iety Inventory scores significantly decreased 33% from 18.6
=3.37, df=19, p < .003) and mean PHQ-9 (depression) scores
3 (5.4) to 7.1 (6.7), (t=3.68, df=19, p < 0.002) (see Figure 5).
ores across treatment Figure 5. BAI and PHQ-Depression scores
r of sessions for this sample was just under 11. Also, two of the
mpleters had documented mild and moderate traumatic brain
that this form of exposure can be usefully applied with this
BAI and PHQ-Depression scores
• 벡 불안 지수는 평균 18.6에서 11.9로 33% 감소

• PHQ-9 우울증 지수 역시 13.3에서 7.1로 49% 감소

• 경미한 외상성 뇌손상 (traumatic brain injury) 환자 2명에도 유의미한 효과
http://www.ncbi.nlm.nih.gov/pubmed/19377167

• Puretech Health

• ‘새로운 개념의 제약회사’를 추구하는 회사

• 기존의 신약 뿐만 아니라, 게임, 앱 등을 이용한 Digital Therapeutics 를 개발

• Digital Therapeutics는 최근 미국 FDA의 de novo 승인을 받기도 함

• Puretech Health

• 신약 파이프라인 중에는 일반적인 small molecule 등도 있지만,

• Akili: ADHD, 우울증, 알츠하이머 등을 위한 인지 능력 개선 목적의 게임 (Project EVO)

• Sonde: Voice biomarker 를 이용한 우울증 등 mental health의 진단 및 모니터링 목적

Video game training enhances cognitive control in
older adults
J. A. Anguera1,2,3
, J. Boccanfuso1,3
, J. L. Rintoul1,3
, O. Al-Hashimi1,2,3
, F. Faraji1,3
, J. Janowich1,3
, E. Kong1,3
, Y. Larraburo1,3
,
C. Rolle1,3
, E. Johnston1
& A. Gazzaley1,2,3,4
Cognitivecontrolisdefinedbyasetofneuralprocessesthatallowusto
interact with our complex environment in a goal-directed manner1
.
Humans regularly challenge these control processes when attempting
to simultaneously accomplish multiple goals (multitasking), generat-
ing interference as the result of fundamental information processing
limitations2
. It is clear that multitasking behaviour has become ubi-
quitous in today’s technologically dense world3
, and substantial evid-
ence has accrued regarding multitasking difficulties and cognitive
control deficits in our ageing population4
. Here we show that multi-
tasking performance, as assessed with a custom-designed three-
dimensional video game (NeuroRacer), exhibits a linear age-related
decline from 20 to 79 years of age. By playing an adaptive version of
NeuroRacer in multitasking training mode, older adults (60 to 85
years old) reduced multitasking costs compared to both an active
control group and a no-contact control group, attaining levels beyond
those achieved by untrained 20-year-old participants, with gains
persisting for 6 months. Furthermore, age-related deficits in neural
signatures of cognitive control, as measured with electroencephalo-
graphy,wereremediated by multitasking training (enhanced midline
frontal theta power and frontal–posterior theta coherence). Critically,
thistrainingresultedinperformancebenefitsthatextendedtountrained
cognitive control abilities (enhanced sustained attention and working
memory), with an increase in midline frontal theta power predicting
the training-induced boost in sustained attention and preservation
of multitasking improvement 6 months later. These findings high-
light the robust plasticity of the prefrontal cognitive control system
in the ageing brain, and provide the first evidence, to our knowledge,
ofhowacustom-designedvideogamecanbeusedtoassesscognitive
abilities across the lifespan, evaluate underlying neural mechanisms,
and serve as a powerful tool for cognitive enhancement.
In a first experiment, we evaluated multitasking performance across
the adult lifespan. A total of 174 participants spanning six decades of life
(ages 20–79; ,30 individuals per decade) played a diagnostic version of
NeuroRacertomeasuretheirperceptualdiscriminationability(‘signtask’)
withandwithoutaconcurrentvisuomotortrackingtask(‘drivingtask’;see
Supplementary Information for details of NeuroRacer). Performance
was evaluated using two distinct game conditions: ‘sign only’ (respond
as rapidly as possible to the appearance of a sign only when a green circle
was present); and ‘sign and drive’ (simultaneously perform the sign task
while maintaining a car in the centre of a winding road using a joystick
(that is, ‘drive’; see Fig. 1a)). Perceptual discrimination performance was
evaluatedusingthesignaldetectionmetricofdiscriminability(d9).A‘cost’
index was used to assess multitasking performance by calculating the
percentage change in d9 from ‘sign only’ to ‘sign and drive’, such that
greater cost (that is, a more negative percentage cost) indicates increased
interference when simultaneously engaging in the two tasks (see Methods
Summary).
Prior to the assessment of multitasking costs, an adaptive staircase
algorithm was used to determine the difficulty levels of the game at
which each participant performed the perceptual discrimination and
visuomotor tracking tasks in isolation at ,80% accuracy. These levels
were then used to set the parameters of the component tasks in the
multitasking condition, so that each individual played the game at a
customizedchallengelevel.Thisensuredthatcomparisonswouldinform
differences in the ability to multitask, and not merely reflect disparities in
component skills (see Methods, Supplementary Figs 1 and 2, and Sup-
plementary Information for more details).
Multitasking performance diminished significantly across the adult
lifespan in a linear fashion (that is, increasing cost, see Fig. 2a and Sup-
plementaryTable1),withtheonlysignificantdifferenceincostbetween
adjacent decades being the increase from the twenties (226.7% cost) to
the thirties (238.6% cost). This deterioration in multitasking perform-
ance is consistent with the pattern of performance decline across the
lifespan observed for fluid cognitive abilities, such as reasoning5
and
working memory6
. Thus, using NeuroRacer as a performance assess-
ment tool, we replicated previously evidenced age-related multitasking
deficits7,8
, and revealed that multitasking performance declines linearly
as we advance in age beyond our twenties.
In a second experiment, we explored whether older adults who trained
by playing NeuroRacer in multitasking mode would exhibit improve-
mentsintheirmultitaskingperformanceonthegame9,10
(thatis,diminished
NeuroRacer costs). Critically, we also assessed whether this training
1
Department of Neurology, University of California, San Francisco, California 94158, USA. 2
Department of Physiology, University of California, San Francisco, California 94158, USA. 3
Center for Integrative
Neuroscience, University of California, San Francisco, California 94158, USA. 4
Department of Psychiatry, University of California, San Francisco, California 94158, USA.
1
month
MultitaskingSingle taskNo-contact
control
Initial
visit
NeuroRacer
EEG and
cognitive
testing
Drive only Sign only Sign and drive
and
1 hour × 3 times per week × 1 month
or
Single task Multitask
6+
months
Training intervention
NeuroRacer
or
a
b
+ +
Figure 1 | NeuroRacer experimental conditions and training design.
a, Screen shot captured during each experimental condition. b, Visualization of
training design and measures collected at each time point.
5 S E P T E M B E R 2 0 1 3 | V O L 5 0 1 | N A T U R E | 9 7
Macmillan Publishers Limited. All rights reserved©2013

Video game training enhances cognitive control in older adults
https://www.youtube.com/watch?v=1xPX8F_wl0c

transferred to enhancements in their cognitive control abilities11
beyond
those attained by participants who trained on the component tasks in
isolation. In designing the multitasking training version of NeuroRacer,
during game play as a key mechanistic feature of the tr
In addition, although cost reduction was observed o
group, equivalent improvement in component task sk
byboth STTandMTT(seeSupplementary Figs 4 and
that enhancedmultitaskingabilitywas notsolelyther
component skills, but a function of learning to res
generated by the two tasks when performed concurr
the d9 cost improvement following training was not th
trade-off, as driving performance costs also diminish
group from pre- to post-training (see Supplementa
Notably in the MTT group, the multitasking pe
remained stable 6 months after training without boo
6 months, 221.9% cost). Interestingly, the MTT grou
cost improved significantly beyond the cost level attai
20 year olds who played a single session of NeuroRac
experiment 3; P , 0.001).
Next, we assessed if training with NeuroRacer le
enhancementsofcognitivecontrolabilitiesthatareknow
in ageing (for example, sustained attention, divided a
memory; see Supplementary Table 2)12
. We hypoth
immersed in a challenging, adaptive, high-interferen
for a prolonged period of time (that is, MTT) would
cognitive performance on untrained tasks that also dem
control. Consistent with our hypothesis, significant
interactions and subsequent follow-up analyses eviden
training improvements in both working memory (de
task with and without distraction7
; Fig. 3a, b) and su
†
–100%
–90%
–80%
–70%
–60%
–50%
–40%
–30%
–20%
–10%
Multitaskingcost(d′)
†
*
ba
1
month
later
6
months
later
Experiment 1: lifespan Experiment 2: training
Single task training
No-contact control
Multitasking training
0%
20s 30s 40s 50s 60s 70s Initial
Figure 2 | NeuroRacer multitasking costs. a, Costs across the lifespan
(n 5 174) increased (that is, a more negative percentage) in a linear fashion
when participants were grouped by decade (F(1,5) 5 135.7, P , 0.00001) or
analysed individually (F(1,173) 5 42.8, r 5 0.45, P , 0.00001; see
Supplementary Fig. 3), with significant increases in cost observed for all age
groups versus the 20-year-old group (P , 0.05 for each decade comparison).
b, Costs before training, 1 month post-training, and 6 months post-training
showed a session X group interaction (F(4,72) 5 7.17, P , 0.0001, Cohen’s
d 5 1.10), with follow-up analyses supporting a differential benefit for the
MTT group (Cohen’s d for MTT vs STT 5 1.02; MTT vs NCC5 1.20).
{P , 0.05 within group improvement from pre to post, *P , 0.05 between
groups (n 5 46). Error bars represent s.e.m.
–100
0
100
200
Pre–post WM task with
distractions (RT)
RTdifference(ms)
†
*
a
–100
0
100
200
Pre–p
without d
RTdifference(ms)
†
b
RESEARCH LETTER
z
• 게임을 통한 고령층의 인지 능력 (멀티태스킹 능력) 개선 효과가 있음을 증명

• 60-85세 참가자 46명을 4주간 뉴로레이서를 통해서 훈련

• 그 결과 훈련 받지 않은 20대보다 더 잘 하게 되었으며,

• 연습을 하지 않고 6개월이 지나도, 능력은 그대로 남아 있었다.
Nature 501, 97–101 (2013)

(vigilance; test of variables of attention (T
group (Fig. 3c; see Supplementary Table
several statistical trendssuggestive of impro
ance on other cognitive controltasks (dual-
and changedetectiontask;see analysisofco
in Supplementary Table 2). Note that alth
and sustained attention improvements w
rapid responses to test probes, neither im
alternative version of the TOVA) nor accu
cant group differences, revealing that traini
of a speed/accuracy trade-off. Importantl
ments were specific to working memory a
cesses, and not theresult ofgeneralized incr
as no group X session interactions were fou
tasks (a stimulus detection task and the dig
see Supplementary Table 2). Finally, only
significant correlation between multitaski
withNeuroRacer)andimprovementsonan
task (delayed-recognition with distraction
(Fig. 3d).
These important ‘transfer of benefits’ sug
lying mechanism of cognitive control was c
MTT with NeuroRacer. To assess this furth
basis of training effects by quantifying even
tions (ERSP) and long-range phase coheren
of each sign presented during NeuroRacer
Wespecificallyassessedmidlinefrontalthe
EEG measure of cognitive control (for exam
tained attention15
and interference resolutio
prefrontal cortex. In addition, we analysed
between frontal and posterior brain region
measure also associated with cognitive con
memory14
and sustained attention15
). Se
power and coherence each revealed signifi
b Long-range theta coherence
Older adult post-training
PLV
(% coherence)
1 5 10
*
)
Initial
Older adults Younger adults
†
Midline frontal theta
Power(dB)
Initial
*
a
Older adults Younger adults
Older adult post-training
Single task
training
Multitasking
training
No-contact
control
3.40
3.05
2.70
2.35
1.65
1.30
0.95
0.60
0.25
–0.10
–0.45
–0.80
–1.15
–1.50
2.00
Nature 501, 97–101 (2013)
• 인지 능력의 개선은 brain activity 로도 동일하게 관찰되었다.

• 노년층 실험군에서 기술이 향상될수록 cognitive control을 관장하는  
 
prefrontal cortex 의 activity가 높아지는 것이 관찰되었다.

OPEN
ORIGINAL ARTICLE
Characterizing cognitive control abilities in children with
16p11.2 deletion using adaptive ‘video game’ technology: a
pilot study
JA Anguera1,2
, AN Brandes-Aitken1
, CE Rolle1
, SN Skinner1
, SS Desai1
, JD Bower3
, WE Martucci3
, WK Chung4
, EH Sherr1,5
and
EJ Marco1,2,5
Assessing cognitive abilities in children is challenging for two primary reasons: lack of testing engagement can lead to low testing
sensitivity and inherent performance variability. Here we sought to explore whether an engaging, adaptive digital cognitive
platform built to look and feel like a video game would reliably measure attention-based abilities in children with and without
neurodevelopmental disabilities related to a known genetic condition, 16p11.2 deletion. We assessed 20 children with 16p11.2
deletion, a genetic variation implicated in attention deficit/hyperactivity disorder and autism, as well as 16 siblings without the
deletion and 75 neurotypical age-matched children. Deletion carriers showed significantly slower response times and greater
response variability when compared with all non-carriers; by comparison, traditional non-adaptive selective attention assessments
were unable to discriminate group differences. This phenotypic characterization highlights the potential power of administering
tools that integrate adaptive psychophysical mechanics into video-game-style mechanics to achieve robust, reliable measurements.
Translational Psychiatry (2016) 6, e893; doi:10.1038/tp.2016.178; published online 20 September 2016
INTRODUCTION
Cognition is typically associated with measures of intelligence
(for example, intellectual quotient (IQ)1
), and is a reflection of
one’s ability to perform higher-level processes by engaging
specific mechanisms associated with learning, memory and
reasoning. Such acts require the engagement of a specific subset
of cognitive resources called cognitive control abilities,2–5
which
engage the underlying neural mechanisms associated with atten-
tion, working memory and goal-management faculties.6
These
abilities are often assessed with validated pencil-and-paper
approaches or, now more commonly with these same paradigms
deployed on either desktop or laptop computers. These
approaches are often less than ideal when assessing pediatric
populations, as children have highly varied degree of testing
engagement, leading to low test sensitivity.7–9
This is especially
concerning when characterizing clinical populations, as increased
performance variability in these groups often exceeds the range of
testing sensitivity,7–9
limiting the ability to characterize cognitive
deficits in certain populations. A proper assessment of cognitive
control abilities in children is especially important, as these
abilities allow children to interact with their complex environment
in a goal-directed manner,10
are predictive of academic
performance11
and are correlated with overall quality of life.12
For pediatric clinical populations, this characterization is especially
critical as they are often assessed in an indirect fashion through
intelligence quotients, parent report questionnaires13
and/or
behavioral challenges,14
each of which fail to properly characterize
these abilities in a direct manner.
One approach to make testing more robust and user-friendly is
to present material in an optimally engaging manner, a strategy
particularly beneficial when assessing children. The rise of digital
health technologies facilitates the ability to administer these types
of tests on tablet-based technologies (that is, iPad) in a game-like
manner.15
For instance, Dundar and Akcayir16
assessed tablet-
based reading compared with book reading in school-aged
children, and discovered that students preferred tablet-based
reading, reporting it to be more enjoyable. Another approach
used to optimize the testing experience involves the integration of
adaptive staircase algorithms, as the incorporation of such appro-
aches lead to more reliable assessments that can be completed in
a timely manner. This approach, rooted in psychophysical
research,17
has been a powerful way to ensure that individuals
perform at their ability level on a given task, mitigating the possi-
bility of floor/ceiling effects. With respect to assessing individual
abilities, the incorporation of adaptive mechanics acts as a
normalizing agent for each individual in accordance with their
underlying cognitive abilities,18
facilitating fair comparisons between
groups (for example, neurotypical and study populations).
Adaptive mechanics in a consumer-style video game experi-
ence could potentially assist in the challenge of interrogating
cognitive abilities in a pediatric patient population. This synergistic
approach would seemingly raise one’s level of engagement by
making the testing experience more enjoyable and with greater
sensitivity to individual differences, a key aspect typically missing
in both clinical and research settings when testing these
populations. Video game approaches have previously been
utilized in clinical adult populations (for example, stroke,19,20
1
Department of Neurology, University of California, San Francisco, San Francisco, CA, USA; 2
Department of Psychiatry, University of California, San Francisco, San Francisco, CA,
USA; 3
Akili Interactive Labs, Boston, MA, USA; 4
Department of Pediatrics, Columbia University Medical Center, New York, NY, USA and 5
Department of Pediatrics, University of
California, San Francisco, San Francisco, CA, USA. Correspondence: JA Anguera or EJ Marco, University of California, San Francisco, Mission Bay – Sandler Neurosciences Center,
UCSF MC 0444, 675 Nelson Rising Lane, Room 502, San Francisco, CA 94158, USA.
E-mail: joaquin.anguera@ucsf.edu or elysa.marco@ucsf.edu
Received 6 March 2016; revised 13 July 2016; accepted 18 July 2016
Citation: Transl Psychiatry (2016) 6, e893; doi:10.1038/tp.2016.178
www.nature.com/tp
Figure 2. Project: EVO selective attention performance. (a) EVO single- and multi-tasking response time performance f
non-affected siblings and non-affected control groups). (b) EVO multi-tasking RT. (c) Visual search task performance
Characterizing cognitive control abilities in child
JA Anguera et al
•Project EVO (게임)을 통해서,

•아동 집중력 장애(attention disorder) 관련 특정 유전형 carrier 를 골라낼 수 있음

•게임에서의 Response Time을 기준으로 carrier vs. non-carrier 간 유의미한 차이

RESEARCH ARTICLE
A pilot study to determine the feasibility of
enhancing cognitive abilities in children with
sensory processing dysfunction
Joaquin A. Anguera1,2☯
*, Anne N. Brandes-Aitken1☯
, Ashley D. Antovich1
, Camarin
E. Rolle1
, Shivani S. Desai1
, Elysa J. Marco1,2,3
1 Department of Neurology, University of California, San Francisco, United States of America, 2 Department
of Psychiatry, University of California, San Francisco, United States of America, 3 Department of Pediatrics,
University of California, San Francisco, United States of America
☯ These authors contributed equally to this work.
* joaquin.anguera@ucsf.edu
Abstract
Children with Sensory Processing Dysfunction (SPD) experience incoming information in
atypical, distracting ways. Qualitative challenges with attention have been reported in these
children, but such difficulties have not been quantified using either behavioral or functional
neuroimaging methods. Furthermore, the efficacy of evidence-based cognitive control inter-
ventions aimed at enhancing attention in this group has not been tested. Here we present
work aimed at characterizing and enhancing attentional abilities for children with SPD. A
sample of 38 SPD and 25 typically developing children were tested on behavioral, neural,
and parental measures of attention before and after a 4-week iPad-based at-home cognitive
remediation program. At baseline, 54% of children with SPD met or exceeded criteria on a
parent report measure for inattention/hyperactivity. Significant deficits involving sustained
attention, selective attention and goal management were observed only in the subset of
SPD children with parent-reported inattention. This subset of children also showed reduced
midline frontal theta activity, an electroencephalographic measure of attention. Following
the cognitive intervention, only the SPD children with inattention/hyperactivity showed both
improvements in midline frontal theta activity and on a parental report of inattention. Notably,
33% of these individuals no longer met the clinical cut-off for inattention, with the parent-
reported improvements persisting for 9 months. These findings support the benefit of a
targeted attention intervention for a subset of children with SPD, while simultaneously
highlighting the importance of having a multifaceted assessment for individuals with neuro-
developmental conditions to optimally personalize treatment.
Introduction
Five percent of all children suffer from Sensory Processing Dysfunction (SPD)[1], with these
individuals exhibiting exaggerated aversive, withdrawal, or seeking behaviors associated with
sensory inputs [2]. These sensory processing differences can have significant and lifelong con-
sequences for learning and social abilities, and are often shared by children who meet
PLOS ONE | https://doi.org/10.1371/journal.pone.0172616 April 5, 2017 1 / 19
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Anguera JA, Brandes-Aitken AN, Antovich
AD, Rolle CE, Desai SS, Marco EJ (2017) A pilot
study to determine the feasibility of enhancing
cognitive abilities in children with sensory
processing dysfunction. PLoS ONE 12(4):
e0172616. https://doi.org/10.1371/journal.
pone.0172616
Editor: Jacobus P. van Wouwe, TNO,
NETHERLANDS
Received: October 5, 2016
Accepted: February 1, 2017
Published: April 5, 2017
Copyright: © 2017 Anguera et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: All relevant data are
within the paper and its Supporting Information
files.
Funding: This work was supported by the
Mickelson-Brody Family Foundation, the Wallace
Research Foundation, the James Gates Family
Foundation, the Kawaja-Holcombe Family
Foundation (EJM), and the SNAP 2015 Crowd
funding effort.
•감각처리장애(SPD)를 가진 소아 환자 중 ADHD를 가진 20명에 대해서 실험

•4주 동안 (주당 5일, 25분)Project EVO 게임을 하게 한 결과,

•20명 중 7명이 큰 개선을 보여서 더 이상 ADHD의 범주에 들지 않게 됨

•사용 후 적어도 9개월 동안 효과가 지속되었음
Fig 4. Transfer effect on behavioral and parent report measures. Pre and post (A) response time (B) and resp
revealing within group change. Error bars indicate standard error of the mean. Within group main effects of session
= p .05, ** =.p .01. Sun symbols indicate statistically significant instances where SPD+IA post-training performa
TDC group prior to training. (C) Vanderbilt parent report inattention change bar plot (calculated by pre-post margina
significant group x session interaction. Error bars indicate standard error of the mean. All group x session interactio
stars (* = p .05, ** =.p .01) on bar graph.
https://doi.org/10.1371/journal.pone.0172616.g004
PLOS ONE | https://doi.org/10.1371/journal.pone.0172616 April 5, 2017

•ADHD에 대해서는 대규모 RCT phase III 임상 시험 진행 중이며, FDA 의료기기 인허가 목표

•8-12살 환자(n=330), 치료 효과 없는 비디오게임을 control group으로

•primary endpoint: TOVA

•의사의 처방을 받는 ADHD 치료용 게임 + 보험사의 커버 목표

우울증 치료 임상 결과
1
임상 기간 : 2014년 10월 ~ 2016년 12월
N=96, 1회 30분 자극
Severe
Moderate
Mild
10
20
30
40
Beck Depression Inventory II
6주 42회 연속 복용SSRI
Ybrain 5회 1회 1회5회
0
10
20
30
40
BASELINE 2 WEEK 4 WEEK 6 WEEK
MADRS
6주 42회 연속 복용
Ybrain
SSRI
5회 1회 1회5회
Severe
Moderate
Mild
None
Primary Outcome:

몽고메리-아스퍼그 우울평가척도(MADRS)
Secondary Outcome:

Beck 우울 척도(Beck Depression Inventory II)
Courtesy of 이기원 대표님, YBrain
•국내 96명 환자를 대상으로 2년간 double-blinded randomised 임상 연구 실시

•실험군: 가짜 약+ 진짜 자극기기

•대조군: 진짜 약 + 가짜 자극기기

•Primary Outcome인 MADRS 스케일에서 기기가 약에 조금 못 미치는 결과

1
임상 기간 : 2014년 10월 ~ 2016년 12월
N=96, 1회 30분 자극
Severe
Moderate
Mild
10
20
30
40
0
10
20
30
40
MADRS
Ybrain
SSRI
5회 1회 1회5회
Severe
Moderate
Mild
None
Primary Outcome:

Secondary Outcome:

•Primary Outcome인 MADRS에서 기존 약물에 비해서 약간 효능이 적게 나옴

•Secondary Outcome인 BDI 에 대해서는 기존 약물과 동등하게 나옴

•이러한 결과에 따라서 식약처에서 ‘3등급 보조의료기기’ 로 인허가

•따라서, 원칙적으로는 기존에 우울증 약을 복용하는 환자를 대상으로 사용하게 될 것임

•경두개 직류자극치료술(tDCS)

•2017년 3월 국내 최초로 식약처의 3등급 보조의료기기 허가

•7월에는 유럽 CE허가를 받을 예정

•2~3년 내 FDA 허가를 받는 것을 목표

•추가 임상 연구 예정

•우울증

•독거 노인 우울증 치료 시범 사업 진행 중

•10월부터 하버드 의대와 아시아 지역 500명 대상의 임상 예정

•경도인지장애 임상 예정

•조현병 1차 임상 마무리 + 논문 출판 예정

•신의료기술평가 진행 예정

RespeRate
• FDA 승인 받은 유일한 비약물 고혈압 치료법
• sessions of therapeutic breathing 을 통해서 혈압 강하 효과
• 15분씩 일주일에 a few times 활용하면 signiﬁcant blood pressure reduction 증명
• 전세계 25만 명 이상 사용

2breathe
• 디지털 기기 중, 수면 유도 목적으로는 2breathe가 유일
• 고혈압 치료기기의 ‘부작용’으로 수면 유도 효과 발견
• 안전성은 수십만 명의 환자에게 임상 시험 통해서 증명
• 교감신경의 활성화를 줄임으로써 사용자의 릴렉스와 수면을 유도

2breathe
https://www.youtube.com/watch?v=u7qVC62etmI

수면 장애
불면증
호흡장애
기면증, 하지불안 증후군…
스트레스
생활습관 (운동, 음식 등)
원인 불명
비약물치료 약물치료
수면 무호흡
코골이
코골이 환자의 75%가  
수면 무호흡 증후군
무호흡: 호흡이 10초 이상 정지
저호흡: 호흡량 50% 이하 감소,
산소 포화도 4% 이상 저하
수면 무호흡 증후군: 무호흡이 시간
당 5회 이상, 7시간당 30회 이상
중추성: 뇌의 호흡 신호 전달 이상
폐쇄성: 좁은 상기도가 물리적으로 폐쇄
혼합성: 중추성+폐쇄성
수면의 질 하락
피로, 주간 졸림,  
우울증, 두통
부정맥, 고혈압,  
심장질환, 폐질환
비수술적 치료: 체중감량, 생활습관 개선,
약물치료, 구강내 장치
수술적 치료: 비강수술, 인두부 수술, 설
부 축소수술 등

수면 관련

디지털 헬스
스마트폰
디바이스
Sleep Cycle, 백색 소음 앱 … etc
웨어러블
손목/손가락
머리
복부
Fitbit, Mi band, Basis Peak (팔찌), Oura Ring (반지)
Zeo, 프라센
2breathe
침대 관련
매트리스/배게 아래
침대
Beddit,EarlySense (SleepSense), Nora
MODD, Sleep Number, Tempur
침실 기기 Withings Aura, Resmed S+

슬립셋
•백색 소음 등을 들려주며 정신적 안정을 유도

•의학적으로 효과가 증명되었다고 보기는 어려움

•수면리포트

•불면증, 스트레스, 육아태교, 학습능력

Fitbit
•Fitbit 을 포함한 대부분의 활동량 측정계는 수면 모니터링 기능을 포함

•움직임 (가속도계, 자이로미터) 및 심박수를 기반으로 모니터링

•따라서, 수면 모니터링은 정확하지 않으며, PSG와의 동등성 역시 증명된 바 없음

Fitbit
•움직임 (가속도계, 자이로미터) 및 심박수를 기반으로 모니터링

•자동 수면 모드, 총 수면 시간

•수면 - 뒤척임 - 깨어남 (3단계로 구분)

•수면 안대를 이용해서 뇌파, 심박수, 호흡, 온도, 안구, 안면 근육 움직임 등을 측정

•깊은 수면, 얕은 수면, REM 수면을 구분

•색 패턴(LED)과 소리를 활용해서 렘수면과 비렘수면(NREM)의 변환 유도

•낮은 주파수의 입체음향을 제공함으로써 깊은 수면 뇌파를 유도

•임상적으로 증명되지는 않은 것으로 알고 있음; 투자 유치에 난항
프라센

ResMed S+
•비접촉식 수면 모니터링 센서

•침대 근처에 두면 진동, 소리 등을 통해 수면을 모니터링

•Deep - Light - REM sleep - Disruption - Onset 을 구분

•의료기기는 아님; PSG와의 일치 여부 역시 알려진 바 없음

Table 1
모니터링
치료/개선
불면증 호흡관련
(sleep induction) (수면무호흡, 코골이)
Sleep Cycle
백색소음
Fitbit, Miband, 기어S
Oura Ring
Zeo
Beddit,EarlySense
ResMed S+
백색 소음

프라센
2breathe

MODD
Sleep Cycle

Beddit,EarlySense

2breathe
증명되지 않음
데이터 있음
정확도 근거 부재
PSG와 일치 x
수면 무호흡, 코골이
측정이 목적은 아니지
만, 호흡과 소리 측정
Nora

Saver bed

Balluga etc
•모니터링 만으로는 효용의 제공이 어려움

•So What 문제 봉착: 모니터링 ➞ 치료/개선으로 이어질 수 있어야 함

•모니터링의 경우: 데이터의 정확성 자체의 담보가 어려움; PGS와 일치하지 않음

•치료/개선의 경우

•효과/안전성의 입증 필요

•의료기기/비의료기기 선택

(임상 연구 파트너를 찾고 있다고 합니다)

모바일 헬스에 대한 FDA의 가이드라인
•2011년 7월: 모바일 의료 어플리케이션 가이드라인 초안

•2013년 10월: 업데이트 된 최종 가이드라인 제시

•2015년 1월: 웰니스 목적의 위험도가 낮은 기기에 대한 가이드라인

모바일 헬스에 대한 FDA의 가이드라인
모든 앱과 기기들이 FDA 규제를 적용 받아야 하는 것은 아니나,  
그 기능이 제대로 작동하지 않을 경우 소비자들의 건강을 위협할 수도 있는 앱과 기기는  
기존의 의료용 기기가 받았던 것과 같은 엄격한 수준의 규제를 적용한다.
의료 기기/앱의 경우에도 리스크가 높지 않으면

규제 받지 않을 수 있다.

http://www.nature.com/news/mental-health-there-s-an-app-for-that-1.19694
• 정신 건강 관리 앱은 잠재적인 리스크 정도를 쉽게 판단하기 어려운 경우가 많다.

• 의료기기 / 웰니스의 불명확한 경계

• 질병을 예방, 치료, 진단해주는 앱

• ‘기분을 좋게 해준다’ 거나, ‘코칭’ 을 해주는 앱?

The Journal of Clinical Investigation C L I N I C A L M E D I C I N E
Introduction
Clinical laboratory testing plays a critical role in health care and
evidence-based medicine (1). Lab tests provide essential data
that support clinical decisions to screen, diagnose, and treat
health conditions (2). Most individuals encounter clinical testing
through their health care provider during a routine health assess-
ment or as a patient in a health care facility. However, individu-
als are increasingly playing more active roles in managing their
health, and some now seek direct access to laboratory testing for
self-guided assessment or monitoring (3–5).
IntheUSA,allclinicallaboratorytestingconductedonhumans
is regulated by Centers for Medicare & Medicaid Services (CMS)
based on guidelines outlined in Clinical Laboratory Improvement
Amendments (CLIA) (6). To ensure analytical quality of labora-
tory methods, certified laboratories are required to participate in
periodic proficiency testing using a homogeneous batch of sam-
ples that are distributed to each laboratory from a CMS-approved
proficiency testing program. These programs assess the total
allowable error (TEa) that combines method bias and total impre-
cision for each analyte. Acceptability criteria are determined by
CLIA and/or the appropriate accrediting agency (7).
Direct-to-consumer service models now provide means for
individuals to obtain laboratory testing outside traditional health
care settings (4, 5). One company implementing this new model is
Theranos, which offers a blood testing service that uses capillary
tube collection and promises several advantages over traditional
venipuncture: lower collection volumes (typically ≤150 μl versus
≥1.5 ml), convenience, and reduced cost — on average about 5-fold
less than the 2 largest testing laboratories in the USA (Quest and
LabCorp) (8). However, availability of these services varies by
state, where access to offerings may be more or less restrictive
BACKGROUND. Clinical laboratory tests are now being prescribed and made directly available to consumers through retail
outlets in the USA. Concerns with these test have been raised regarding the uncertainty of testing methods used in these
venues and a lack of open, scientific validation of the technical accuracy and clinical equivalency of results obtained through
these services.
METHODS. We conducted a cohort study of 60 healthy adults to compare the uncertainty and accuracy in 22 common clinical
lab tests between one company offering blood tests obtained from finger prick (Theranos) and 2 major clinical testing services
that require standard venipuncture draws (Quest and LabCorp). Samples were collected in Phoenix, Arizona, at an ambulatory
clinic and at retail outlets with point-of-care services.
RESULTS. Theranos flagged tests outside their normal range 1.6× more often than other testing services (P < 0.0001). Of the
22 lab measurements evaluated, 15 (68%) showed significant interservice variability (P < 0.002). We found nonequivalent
lipid panel test results between Theranos and other clinical services. Variability in testing services, sample collection times,
and subjects markedly influenced lab results.
CONCLUSION. While laboratory practice standards exist to control this variability, the disparities between testing services
we observed could potentially alter clinical interpretation and health care utilization. Greater transparency and evaluation of
testing technologies would increase their utility in personalized health management.
FUNDING. This work was supported by the Icahn Institute for Genomics and Multiscale Biology, a gift from the Harris Family
Charitable Foundation (to J.T. Dudley), and grants from the NIH (R01 DK098242 and U54 CA189201, to J.T. Dudley, and R01
AG046170 and U01 AI111598, to E.E. Schadt).
Evaluation of direct-to-consumer low-volume lab tests
in healthy adults
Brian A. Kidd,1,2,3
Gabriel Hoffman,1,2
Noah Zimmerman,3
Li Li,1,2,3
Joseph W. Morgan,3
Patricia K. Glowe,1,2,3
Gregory J. Botwin,3
Samir Parekh,4
Nikolina Babic,5
Matthew W. Doust,6
Gregory B. Stock,1,2,3
Eric E. Schadt,1,2
and Joel T. Dudley1,2,3
1
Department of Genetics and Genomic Sciences, 2
Icahn Institute for Genomics and Multiscale Biology, 3
Harris Center for Precision Wellness, 4
Department of Hematology and Medical Oncology, and
5
Department of Pathology, Icahn School of Medicine at Mount Sinai, NewYork, NewYork, USA. 6
Hope Research Institute (HRI), Phoenix, Arizona, USA.
Conflict of interest: J.T. Dudley owns equity in NuMedii Inc. and has received consulting
fees or honoraria from Janssen Pharmaceuticals, GlaxoSmithKline, AstraZeneca, and
LAM Therapeutics.
Role of funding source: Study funding provided by the Icahn Institute for Genomics
and Multiscale Biology and the Harris Center for Precision Wellness at the Icahn
School of Medicine at Mount Sinai. Salaries of B.A. Kidd, J.T. Dudley, and E.E. Schadt
Downloaded from http://www.jci.org on March 28, 2016. http://dx.doi.org/10.1172/JCI86318
•Mt Sinai 에서 내어놓은 Theranos 의 정확도에 대한 논문

•2015년 7월 경에 60명의 건강한 환자들을 대상으로 5일 간에 걸쳐서

•22가지의 검사 항목을 테라노스와 또 다른 두 군데의 검사 기관에 맡겨서 결과를 비교

•결론적으로 Theranos의 결과가 많이 부정확

•콜레스테롤 등의 경우는 의사의 진단이 바뀔 정도로 크게 부정확

•전반적인 테스트들 결과 정상 범위가 아니라고 판단하는 경우가 테라노스가 1.6배 많음

•22개의 검사 항목 중에서 15개에서 유의미하게 결과의 차이가 있었습니다.

•논문에서는 알 수 없는 또 다른 문제

•Theranos가 자체적으로 개발했다고 '주장' 했던 에디슨 기기를 정말로 썼느냐...하는 것

•WSJ 에 나온 과거 직원의 증언에 따르면, 이미 2015년 7월경이라면,

•에디슨 기기를 쓰지 않고 지멘스 등 기존 다른 기기에 혈액을 희석해서 쓰고 있을 때

•역시나(?) 이번에도 테라노스는 conflict-of-interest 가 있는 잘못된 논문이라는 반응

디지털 헬스케어 전반에서는

근거의 중요성을 인식하고 있고,

좋은 디자인의 임상연구도 증가하고 있다.

Successful weight reduction
and maintenance by using a
smartphone application in those
with overweight and obesity
SangOukChin1,*
,Changwon Keum2,*
, JunghoonWoo3
, Jehwan Park2
, Hyung JinChoi4
,
Jeong-taekWoo5
& SangYoul Rhee5
A discrepancy exists with regard to the effect of smartphone applications (apps) on weight reduction
due to the several limitations of previous studies.This is a retrospective cohort study, aimed to
investigate the effectiveness of a smartphone app on weight reduction in obese or overweight
individuals, based on the complete enumeration study that utilized the clinical and logging data
entered by NoomCoach app users betweenOctober 2012 andApril 2014.A total of 35,921 participants
were included in the analysis, of whom 77.9% reported a decrease in body weight while they were using
the app (median 267 days; interquartile range=182). Dinner input frequency was the most important
factor for successful weight loss (OR=10.69; 95%CI=6.20–19.53; p<0.001), and more frequent
input of weight significantly decreased the possibility of experiencing the yo-yo effect (OR=0.59,
95%CI=0.39–0.89; p<0.001).This study demonstrated the clinical utility of an app for successful
weight reduction in the majority of the app users; the effects were more significant for individuals who
monitored their weight and diet more frequently.
Obesity is a global epidemic with a rapidly increasing prevalence worldwide1,2
. As obese individuals experience
significantly higher mortality when compared with the non-obese population3,4
, this phenomenon poses a sig-
nificant socioeconomic burden, necessitating strategies to manage overweight and prevent obesity5
. Although
numerous interventions such as life style modification including exercise6–10
, and pharmacotherapy11–13
have been
shown effective for both the prevention and treatment of obesity, some of these methods were found to have a
limitation which required substantial financial inputs and repeated time-consuming processes14,15
.
Recently, as the number of smartphone users is increasing dramatically, many investigators have attempted
to implement smartphone applications (app) for health promotion16–19
. Consequently, many smartphone apps
have demonstrated at least partial efficacy in promoting successful weight reduction according to the number
of previous studies20–24
. However, due to the limitations associated with study design such as small-scale studies
and short investigation periods, a discrepancy exists with regard to the effect of apps on weight reduction20,21,23
.
Even systemic reviews which investigated the efficacy of mobile apps for weight reduction reported more or less
inconsistent results; Flores Mateo et al. reported a significant weight loss by mobile phone app intervention when
compared with control groups25
whereas Semper et al. reported that four of the six studies included in the analysis
showed no significant difference of weight reduction between comparison groups26
. Thus, the aim of this study
was to investigate the effectiveness of a smartphone app on weight reduction in obese or overweight individuals
Recei e : 0 pri 016
Accepte : 15 eptem er 016
Pu is e : 0 o em er 016
OPEN
•스마트폰 앱이 체중 감량에 도움을 줄 수 있는가?

•2012년부터 2014년 까지 최소 6개월 이상 애플리케이션을 사용

•80여 국가(미국, 독일, 한국, 영국, 일본 등)에서 모집된 35,921명의 데이터

•애플리케이션 평균 사용기간은 267일
Chin et al. Sci Rep 2016
스마트폰 애플리케이션의 체중 감량 효과 입증에 관한 논문

DaYoung Lee1,2
, Jeongwoon Park3
, DooahChoi3
, Hong-YupAhn4
, Sung-Woo Park1
&
Cheol-Young Park 1
diabetes2,3
. In Korea,
.
tion8–10
,
.
1
Huraypositive Inc.
chol.com)
OPEN
스마트폰 앱의 제2형 당뇨환자의 당화혈색소 감소에 관한 논문

스마트폰 앱의 당뇨병 예방 효과 입증을 위한 임상연구

Video game training enhances cognitive control in
older adults
J. A. Anguera1,2,3
, J. Boccanfuso1,3
, J. L. Rintoul1,3
, O. Al-Hashimi1,2,3
, F. Faraji1,3
, J. Janowich1,3
, E. Kong1,3
, Y. Larraburo1,3
,
C. Rolle1,3
, E. Johnston1
& A. Gazzaley1,2,3,4
Cognitivecontrolisdefinedbyasetofneuralprocessesthatallowusto
interact with our complex environment in a goal-directed manner1
.
Humans regularly challenge these control processes when attempting
to simultaneously accomplish multiple goals (multitasking), generat-
ing interference as the result of fundamental information processing
limitations2
. It is clear that multitasking behaviour has become ubi-
quitous in today’s technologically dense world3
, and substantial evid-
ence has accrued regarding multitasking difficulties and cognitive
control deficits in our ageing population4
. Here we show that multi-
tasking performance, as assessed with a custom-designed three-
dimensional video game (NeuroRacer), exhibits a linear age-related
decline from 20 to 79 years of age. By playing an adaptive version of
NeuroRacer in multitasking training mode, older adults (60 to 85
years old) reduced multitasking costs compared to both an active
control group and a no-contact control group, attaining levels beyond
those achieved by untrained 20-year-old participants, with gains
persisting for 6 months. Furthermore, age-related deficits in neural
signatures of cognitive control, as measured with electroencephalo-
graphy,wereremediated by multitasking training (enhanced midline
frontal theta power and frontal–posterior theta coherence). Critically,
thistrainingresultedinperformancebenefitsthatextendedtountrained
cognitive control abilities (enhanced sustained attention and working
memory), with an increase in midline frontal theta power predicting
the training-induced boost in sustained attention and preservation
of multitasking improvement 6 months later. These findings high-
light the robust plasticity of the prefrontal cognitive control system
in the ageing brain, and provide the first evidence, to our knowledge,
ofhowacustom-designedvideogamecanbeusedtoassesscognitive
abilities across the lifespan, evaluate underlying neural mechanisms,
and serve as a powerful tool for cognitive enhancement.
In a first experiment, we evaluated multitasking performance across
the adult lifespan. A total of 174 participants spanning six decades of life
(ages 20–79; ,30 individuals per decade) played a diagnostic version of
NeuroRacertomeasuretheirperceptualdiscriminationability(‘signtask’)
withandwithoutaconcurrentvisuomotortrackingtask(‘drivingtask’;see
Supplementary Information for details of NeuroRacer). Performance
was evaluated using two distinct game conditions: ‘sign only’ (respond
as rapidly as possible to the appearance of a sign only when a green circle
was present); and ‘sign and drive’ (simultaneously perform the sign task
while maintaining a car in the centre of a winding road using a joystick
(that is, ‘drive’; see Fig. 1a)). Perceptual discrimination performance was
evaluatedusingthesignaldetectionmetricofdiscriminability(d9).A‘cost’
index was used to assess multitasking performance by calculating the
percentage change in d9 from ‘sign only’ to ‘sign and drive’, such that
greater cost (that is, a more negative percentage cost) indicates increased
interference when simultaneously engaging in the two tasks (see Methods
Summary).
Prior to the assessment of multitasking costs, an adaptive staircase
algorithm was used to determine the difficulty levels of the game at
which each participant performed the perceptual discrimination and
visuomotor tracking tasks in isolation at ,80% accuracy. These levels
were then used to set the parameters of the component tasks in the
multitasking condition, so that each individual played the game at a
customizedchallengelevel.Thisensuredthatcomparisonswouldinform
differences in the ability to multitask, and not merely reflect disparities in
component skills (see Methods, Supplementary Figs 1 and 2, and Sup-
plementary Information for more details).
Multitasking performance diminished significantly across the adult
lifespan in a linear fashion (that is, increasing cost, see Fig. 2a and Sup-
plementaryTable1),withtheonlysignificantdifferenceincostbetween
adjacent decades being the increase from the twenties (226.7% cost) to
the thirties (238.6% cost). This deterioration in multitasking perform-
ance is consistent with the pattern of performance decline across the
lifespan observed for fluid cognitive abilities, such as reasoning5
and
working memory6
. Thus, using NeuroRacer as a performance assess-
ment tool, we replicated previously evidenced age-related multitasking
deficits7,8
, and revealed that multitasking performance declines linearly
as we advance in age beyond our twenties.
In a second experiment, we explored whether older adults who trained
by playing NeuroRacer in multitasking mode would exhibit improve-
mentsintheirmultitaskingperformanceonthegame9,10
(thatis,diminished
NeuroRacer costs). Critically, we also assessed whether this training
1
Department of Neurology, University of California, San Francisco, California 94158, USA. 2
Department of Physiology, University of California, San Francisco, California 94158, USA. 3
Center for Integrative
Neuroscience, University of California, San Francisco, California 94158, USA. 4
Department of Psychiatry, University of California, San Francisco, California 94158, USA.
1
month
MultitaskingSingle taskNo-contact
control
Initial
visit
NeuroRacer
EEG and
cognitive
testing
Drive only Sign only Sign and drive
and
1 hour × 3 times per week × 1 month
or
Single task Multitask
6+
months
Training intervention
NeuroRacer
or
a
b
+ +
Figure 1 | NeuroRacer experimental conditions and training design.
a, Screen shot captured during each experimental condition. b, Visualization of
training design and measures collected at each time point.
5 S E P T E M B E R 2 0 1 3 | V O L 5 0 1 | N A T U R E | 9 7
Macmillan Publishers Limited. All rights reserved©2013
비디오 게임의 인지능력 향상 효과에 대한 논문

Original Paper
Delivering Cognitive Behavior Therapy to Young Adults With
Conversational Agent (Woebot): A Randomized Controlled Trial
Kathleen Kara Fitzpatrick1*
, PhD; Alison Darcy2*
, PhD; Molly Vierhile1
, BA
1
Stanford School of Medicine, Department of Psychiatry and Behavioral Sciences, Stanford, CA, United States
2
Woebot Labs Inc., San Francisco, CA, United States
*
these authors contributed equally
Corresponding Author:
Alison Darcy, PhD
Woebot Labs Inc.
55 Fair Avenue
San Francisco, CA, 94110
United States
Email: alison@woebot.io
Abstract
Background: Web-based cognitive-behavioral therapeutic (CBT) apps have demonstrated efficacy but are characterized by
poor adherence. Conversational agents may offer a convenient, engaging way of getting support at any time.
Objective: The objective of the study was to determine the feasibility, acceptability, and preliminary efficacy of a fully automated
conversational agent to deliver a self-help program for college students who self-identify as having symptoms of anxiety and
depression.
Methods: In an unblinded trial, 70 individuals age 18-28 years were recruited online from a university community social media
site and were randomized to receive either 2 weeks (up to 20 sessions) of self-help content derived from CBT principles in a
conversational format with a text-based conversational agent (Woebot) (n=34) or were directed to the National Institute of Mental
Health ebook, “Depression in College Students,” as an information-only control group (n=36). All participants completed
Web-based versions of the 9-item Patient Health Questionnaire (PHQ-9), the 7-item Generalized Anxiety Disorder scale (GAD-7),
and the Positive and Negative Affect Scale at baseline and 2-3 weeks later (T2).
Results: Participants were on average 22.2 years old (SD 2.33), 67% female (47/70), mostly non-Hispanic (93%, 54/58), and
Caucasian (79%, 46/58). Participants in the Woebot group engaged with the conversational agent an average of 12.14 (SD 2.23)
times over the study period. No significant differences existed between the groups at baseline, and 83% (58/70) of participants
provided data at T2 (17% attrition). Intent-to-treat univariate analysis of covariance revealed a significant group difference on
depression such that those in the Woebot group significantly reduced their symptoms of depression over the study period as
measured by the PHQ-9 (F=6.47; P=.01) while those in the information control group did not. In an analysis of completers,
participants in both groups significantly reduced anxiety as measured by the GAD-7 (F1,54= 9.24; P=.004). Participants’ comments
suggest that process factors were more influential on their acceptability of the program than content factors mirroring traditional
therapy.
Conclusions: Conversational agents appear to be a feasible, engaging, and effective way to deliver CBT.
(JMIR Ment Health 2017;4(2):e19) doi:10.2196/mental.7785
KEYWORDS
conversational agents; mobile mental health; mental health; chatbots; depression; anxiety; college students; digital health
Introduction
Up to 74% of mental health diagnoses have their first onset
particularly common among college students, with more than
half reporting symptoms of anxiety and depression in the
previous year that were so severe they had difficulty functioning
챗봇의 우울증 개선 효과에 관한 논문

1
임상 기간 : 2014년 10월 ~ 2016년 12월
N=96, 1회 30분 자극
Severe
Moderate
Mild
10
20
30
40
0
10
20
30
40
MADRS
Ybrain
SSRI
5회 1회 1회5회
Severe
Moderate
Mild
None
Primary Outcome:

Secondary Outcome:

휴대용 의료기기의 우울증 완화 효과 임상

수면다원검사 - fitbit
• 현재 시장에 출시된 수면모니터링 기기 중 PSG와 일치도를 검증한 것은 거의 없다.

https://www.cnbc.com/2018/04/10/facebook-cambridge-analytica-a-timeline-of-the-data-hijacking-scandal.html

• 2013년 연구

• 43%의 앱만이 프라이버시 정책을 가지고 있음

• 72%의 앱이 개인 프라이버시에 대한 중간 (32%), 혹은 높은 (40%) 위험도

RECEIVED 22 December 2013
REVISED 7 July 2014
ACCEPTED 3 August 2014
PUBLISHED ONLINE FIRST 21 August 2014
Availability and quality of mobile health app
privacy policies
Ali Sunyaev1
, Tobias Dehling1
, Patrick L Taylor2
, Kenneth D Mandl3
ABSTRACT
....................................................................................................................................................
Mobile health (mHealth) customers shopping for applications (apps) should be aware of app privacy practices so they
can make informed decisions about purchase and use. We sought to assess the availability, scope, and transparency of
mHealth app privacy policies on iOS and Android. Over 35 000 mHealth apps are available for iOS and Android. Of the
600 most commonly used apps, only 183 (30.5%) had privacy policies. Average policy length was 1755 (SD 1301)
words with a reading grade level of 16 (SD 2.9). Two thirds (66.1%) of privacy policies did not specifically address the
app itself. Our findings show that currently mHealth developers often fail to provide app privacy policies. The privacy pol-
icies that are available do not make information privacy practices transparent to users, require college-level literacy, and
are often not focused on the app itself. Further research is warranted to address why privacy policies are often absent,
opaque, or irrelevant, and to find a remedy.
....................................................................................................................................................
INTRODUCTION
Apple’s iOS and Google’s Android operating systems and asso-
ciated application (app) stores, itunes.apple.com and play.goo-
gle.com, are becoming the de facto global platforms for mobile
health (mHealth).1,2
Recently, both platforms additionally
announced the roll out of their own apps fostering app interop-
erability and offering central storage for all mHealth apps and
sensors of users’ devices.3,4
mHealth apps leverage a wide
range of embedded technology in iOS and Android devices for
collecting and storing personal data, including contacts and
calendars, and patient-reported data as well as information col-
lected with cameras and sensors, including location, accelera-
tion, audio, or orientation.
5–7
Although patients value control of
their personally identifiable data8,9
and the Federal Trade
Commission10
recommends provision of privacy policies for
mobile apps, little attention has been paid to the information
security and privacy policies and practices of mHealth app ven-
dors. Although both app stores retain the right to remove apps
for infringements of privacy, neither has explicit policies
addressing the information security and privacy of medical in-
formation. Users choose among an ecosystem of substitutable
mHealth apps11
and should have transparency as to which
apps have privacy practices best aligned with their individual
preferences. We sought to assess mHealth apps for the pres-
ence and scope of privacy policies, and what information they
offer.
METHODS
We surveyed (figure 1) the most frequently rated and thus pop-
ular English language mHealth apps in the Apple iTunes Store
and the Google Play Store. App stores organize their offerings
in categories (eg, Books, Games, and News). We selected apps
from the Medical and Health and Fitness categories offered in
both stores in May 2013. The iOS app store lists all apps by
category and offers the desired information in plain hypertext
markup language (HTML), enabling us to automatically parse
app information to extract data. On the other hand, the Android
app store uses dynamically generated HTML pages so that the
HTML texts displayed in the browser do not contain much use-
ful information, which is dynamically loaded from an underlying
database. Hence, we used a third-party open-source interface,
the android-market-api (http://code.google.com/p/android-
market-api), for retrieving app information.
Upon initial review, many apps were not available in
English, did not have an English description, or were not
health-related, despite being offered in the categories Medical
or Health and Fitness (eg, apps offering wallpapers). In order to
exclude such apps from further assessment, we tagged all app
descriptions with descriptive terms. The tags characterize
health-related app functionality, access to information, and
handling of information. We manually tagged 200 apps (100
Health and Fitness, 100 Medical) establishing an initial tag cor-
pus and employed string matching12
to automatically tag the
remaining apps. Apps not matched by at least four distinct tags
were excluded from further assessment.
Discovery and evaluation of privacy policies
We used a three-step manual procedure for privacy policy dis-
covery looking at typical locations for privacy policies. Privacy
policies were abstracted from March 2013 to June 2013. First,
we checked for a privacy policy on the app store web site for
the particular app. Then we checked the web page maintained
Correspondence to Professor Ali Sunyaev, Faculty of Management, Economics and Social Sciences, University of Cologne, Albertus-Magnus-Platz, Cologne 50923,
Germany; sunyaev@wiso.uni-koeln.de
BRIEFCOMMUNICATION
Sunyaev A, et al. J Am Med Inform Assoc 2015;22:e28–e33. doi:10.1136/amiajnl-2013-002605, Brief Communication
byguestonApril17,2016http://jamia.oxfordjournals.org/Downloadedfrom
• 2014년 연구

• 600개의 앱 중에 183 개 (약 30%) 만이 프라이버시 정책을 가지고 있었음

Letters
RESEARCH LETTER
Privacy Policies of Android Diabetes Apps
and Sharing of Health Information
Mobile health apps can help individuals manage chronic
health conditions.1
One-fifth of smartphone owners had
health apps in 2012,2
and 7% of primary care physicians rec-
ommended a health app.3
The US Food and Drug Adminis-
tration has approved the prescription of some apps.4
Health
apps can transmit sensitive medical data, including disease
status and medication compliance. Privacy risks and the
relationship between privacy disclosures and practices of
health apps are understudied.
Methods | On January 3, 2014, we identified all Android dia-
betes apps by searching Google Play using the term diabetes.
Android is the most popular mobile operating system
worldwide with 82.8% market share (compared with Apple
iOS’s 13.9%).5
We collected and analyzed privacy policies
and permissions (disclosures of what apps can access or
control on the device) for apps that remained 6 months after
our initial search. Because consumers may want to know
about privacy protections before choosing an app, we deter-
mined which apps had policies available predownload and
what the policies protected. Then we installed a random
subset of apps to determine whether data were transmitted
to third parties, defined as any website not directly under
the developer’s control, such as data aggregators or adver-
tising networks.
We performed χ2
tests of independence (Excel 2010,
Microsoft) to determine whether apps with privacy policies
were more likely to protect personal information than apps
without privacy policies. A 2-sided P value less than .05 was
considered significant.
Results | We identified 271 diabetes apps and chose a random
sample of 75 for the transmission analysis. Within 6
months, 60 apps became unavailable, leaving 211 apps in
the sample and 65 apps in the subset. Most of the 211 apps
(81%) did not have privacy policies. Of the 41 apps (19%)
with privacy policies, not all of the provisions actually pro-
tected privacy (eg, 80.5% collected user data and 48.8%
shared data) (Table 1). Only 4 policies said they would ask
users for permission to share data.
Permissions, which users must accept to download an
app, authorized collection and modification of sensitive
information, including tracking location (17.5%), activating
the camera (11.4%), activating the microphone (3.8%), and
modifying or deleting information (64.0%) (Table 2).
In the transmission analysis, sensitive health informa-
tion from diabetes apps (eg, insulin and blood glucose lev-
els) was routinely collected and shared with third parties,
with 56 of 65 apps (86.2%) placing tracking cookies; 31 of
the 41 apps (76%) without privacy policies, and 19 of 24
apps (79%) with privacy policies shared user information,
which was not statistically significantly different (N = 65;
Table 1. Privacy Policy Provisions for the 41 Apps With Privacy Policies
(19% of the 211 Apps)a
Type of Privacy Policy Provision
Apps,
No. (%)
Personal Information
Shared if required by law 25 (61.0)
Collected when the app is used 21 (51.2)
Collected when a user registers
through an online account
21 (51.2)
Stored in the developer’s system 18 (43.9)
Only disclosed with the user’s consent 12 (29.3)
Shared to improve service 11 (26.8)
Shared with business partners 10 (24.4)
Not sold 9 (22.0)
No personal information
from children collected
6 (14.6)
User Data
Collected 33 (80.5)
Shared with partners and/or third parties 20 (48.8)
May be used for advertisement purposes 16 (39.0)
May be transferred to various countries
around the world
11 (26.8)
Electronic safeguards for data protection
are used
22 (53.7)
Cookies will be used 20 (48.8)
Log files will be collected 7 (17.1)
Aggregated User Data
Does not contain personal information 19 (46.3)
May be used to create statistics 15 (36.6)
May be disclosed to advertisers 7 (17.1)
User Options
Can opt out of cookies 13 (31.7)
Can opt out of receiving emails 9 (22.0)
Can opt out of receiving marketing materials 6 (14.6)
a
Provisions reflect the language used in privacy policies. Less common
provisions (n Յ5) were the following: no personal information from children
under 13 years of age is collected without the consent of a parent; personal
information will not be disclosed to third parties for direct marketing
purposes; data will be shared only with permission from the user; data will not
be shared; personal information will not be shared; health information is
treated differently than other types of information; data will not be sold;
information is collected during the app download process; personal
information will be shared with advertisers; user can opt out of information
transfer to third parties for marketing purposes; no data will be stored;
aggregated user data may be disclosed to analytics and search engine
providers; personal information will be shared with research organizations.
jama.com (Reprinted) JAMA March 8, 2016 Volume 315, Number 10 1051
• 2015년 연구

• 총 211개의 앱 중에 41개 (19%)만 프라이버시 정책이 있음

• 이 41개 앱의 프라이버시 정책을 살펴보면,

• 48%는 써드 파티에게 사용자 데이터를 공유함

• 61%는 법으로 필요한 경우는 데이터가 공개될 수 있음

• 43%는 개발자 시스템에 데이터가 저장

• 즉, 프라이버시 정책을 가지고 있는 경우에도, ‘보호하지 않는’ 정책인 경우가 상당수

Summary
Recent advances and challenges
of digital mental healthcare
• Digital Phenotype
• AI for mental health
• Digital Therapeutics
• Sleep disorder
• Limitations

Feedback/Questions
• E-mail: yoonsup.choi@gmail.com
• Blog: http://www.yoonsupchoi.com
• Facebook: 최윤섭 디지털 헬스케어 연구소

의료의 미래, 디지털 헬스케어: 정신의학을 중심으로

More Related Content

What's hot

Similar to 의료의 미래, 디지털 헬스케어: 정신의학을 중심으로

More from Yoon Sup Choi

Recently uploaded

의료의 미래, 디지털 헬스케어: 정신의학을 중심으로