Data Augmentation of Wearable Sensor Data for Parkinson’s

Data augmentation for
PD using CNN
임우담
woodam.lim@gmail.com

1. Introduction
2. Related work
3. Explanation
4. Experiment
5. Result
6. Conclusion
Contents

Ⅰ. Introduction
최근 연구에서 알 수 있듯이 CNN의 성공적인 행보를 걷고 있음. 그러나 label된 데이터의 양이 많아야 한
다는 전제조건은 CNN 적용의 큰 걸림돌이 요소였음.
• 그래서 기존의 샘플을 사용하여 새로운 샘플을 만드는 Data augmentation을 제안.
=정확한 label을 유지하면서 새롭게 생성하는 것이 주요과제.
• 보통 이미지 연구에선 스케일링, 회전, 변환, 자르기, 지우기 및 이미지의 뒤틀림 등을 사용해
data augmentation을 하였지만 다른 도메인에선 다른 방법을 써야함.
• For example , scaling of the acceleration data may change their labels because some
labels are differentiated by the intensity of motion
• 이 연구는 웨이러블 센서로부터 파킨슨 병(이하 PD) 환자들의 the problem of motor(운동)
state detection 다룰 것임.

Ⅰ. Introduction
• PD 환자들은 보통 *hypo-bradykinesia과 **dyskinesia을 경험함.
• 도파밍성 치료를 받아 bradykinesia를 완화시키는데 투약량이 과해질 경우 dyskinesia를 유발
시킴
• 그러므로 치료를 위한 적절한 투약량 결정이 필요한데 보통 환자들의 자기보고 그리고 주치의
의 관찰에 의존하고 있음.
• 웨어러블 센서 기반 평가 process를 자동화 시키려면 많은 수의 label된 Data 가 필요한데 the
collecting and labelling process가 너무 어렵고 시간이 많이 걸림.
• Large variability due to various symptom pattern, irrelevant motion interference, and
noisy labels 때문에 더 challenging 한 과제라고 소개.
*hypo-bradykinesia (운동완서) : 운동이 비 정상적으로 완만하고, 육체적 정신적 반응이 둔한 것
** dyskinesia (운동이상): 수의적인 움직임이 감소되고 불수의적인 움직임(틱이나 무도증 등)이 나타나는 현상.

Ⅰ. Introduction
• A set of approaches for data augmentation of wearable sensor datasets for CNN-
based classification.
• Application to the task of PD motor state classification, using a clinician-labeled
dataset of 25 PD patients in daily living conditions.
• Experimental comparison of various data augmentation methods.

Ⅱ. Related work
과거 웨어러블 관련 연구가 있었지만 어쩔 수 없는 손의 움직임 때문에 모두 제한적.
• Estimates the severity of tremor, bradykinesia and dyskinesia with multiple
accelerometers using support vector machines (SVMs)
• applies a CNN to the classification of bradykinesia present and absent states based on
the wearable sensor data collected during several motor tasks.
• PD관련 딥러닝 연구도 있었음 . restricted Boltzmann machine(RBM)을 주로 사용하다 최근
엔 간편성 때문에 Drop out과 Data augmentation을 더 많이 씀.
= 결론만 말하면 Data augmentation이 regularization으로 쓰이기 시작하기 때문에 machine
learning의 중요한 preprocessing으로 부상함.
하지만 웨어러블 센서에 대한 standardized data augmentation methods는 연구되지 않았음.

Ⅲ. Explanation
A. Challenges in PD data
• Bradykinesia (운동완서) : 운동이 비 정상적으로 완만하고, 육체적 정신적 반응이 둔한 것
• => Bradykinesia can also be accompanied by tremor, which changes the reading of a
wearable sensor significantly.
• => It become a major source of mispredictions and the confusion between
bradykinesia and a patient who is voluntarily at rest

Ⅲ. Explanation
• Dyskinesia (운동이상): 수의적인 움직임이 감소되고 불수의적인 움직임(틱이나 무도증 등)이
나타나는 현상.
• Bradykinesia 보단 알아차리기 쉽다.
• nonrhythmic and flowing movements 을 보여준다는 점에서 떨림과는 다른데 그래도 이 역
시 오판의 요소가 될 수 있음.

Ⅲ. Explanation
• 그리고 디자인 패턴과 전문가의 label이 명백히 불일치 하게 만드는 몇가지 요소가 있음
• dyskinesia state인데 의자 등을 잡아서 손은 움직이지 않을 때
• Changes in the motor state within fixed length windows. = 중간에 패턴이 바뀜.
• 평가와 무관한 자발적인 움직임

Ⅲ. Explanation
B. Augmentation Methods for Wearable Sensor Data
• 저자 역시 보통 이미지 인식의 argumentation과는 다르다는 것을 인지하기에 target task에
벗어나지 않는 신중한 접근을 언급함.
• 특히나 웨어러블 센서데이터는 육안으로 확인하기 어려워 , 이미지인식과 달리 data
argumentation의 결과에 대한 평가가 어려움
• 따라서 체계적인 비교실험을 통해 다양한 data argumentation 효과를 평가해야한다고 함.

Ⅲ. Explanation
• One factor that can introduce label-invariant variability of wearable sensor data are
differences in sensor placement between participants. For example, an upside-down
placement of the sensor can invert the sign of the sensor readings. However, different
sensor placements do not change the labels, thus, they can be regarded as label-
preserving transformations. Therefore, data can be augmented by applying arbitrary
rotations to the existing data as a way of simulating different sensor placements.
= 센서 placement의 차이는 sign에는 영향을 줄수 있지만 label에는 영향을 주지 않는다 .

Ⅲ. Explanation
• Another factor that can introduce unnecessary variability is the temporal location of
activity events, for example, the appearance of tremor. Since the fixed size window
segmentation is arbitrary, the location of the observed symptom in the window does not
have any meaning. In other words, the same PD data can be represented differently
based on the choice of window localization. thus, we may augment data by perturbing
the location of the windows or events.
= the temporal location of activity events를 바꾸는 방법
= fixed side window가 원래 임의로 설정한 것이므로 window 안에서 증상의 location은 아무런
의미가 없다.
= event나 window의 location에 변화를 줘서 data를 늘려도 무방

Ⅲ. Explanation
따라서 Augmentation 중 데이터 label을 바꾸지않는 방법들을 제안.
• Jittering = 데이터 값에 약간의 노이즈를 추가하는 방법
• Scaling = 객체를 확대하거나 축소하는 선형 변형
• Cropping = 잘라내기를 사용하여 화면 비율변경
• Rotating = Angle에 변화를 주는 법
• Permutation = 1분 크기의 신호를 random N segment로 나눠 그 순서를 랜덤하게 바꿔주는
방법
• Sampling (Wraping)

Ⅳ. Experiment
A. Data Preparation
실험 집단 : 27명의 환자들
착용 장비 : Microsoft Band2
실험 내용 : 일상생활에서 아무것도 요구하지 않은 채 수행됨
Acceleration and gyroscope data는 약 62.5Hz에서 수집됨 그리고 Acceleration만으로
classification에 활용됨.
웨어러블 센서와 기록 장치 사이의 무선 통신으로 인해 샘플링 간격이 불규칙한 경우가 많으므로
이 경우 휴대 전화는 120Hz에서 일정한 간격으로 다시 샘플링됨.

Ⅳ. Experiment
A. Data Preparation
결과물로 1 분 분량의 window는 7200 개의 샘플을 가져야 하지만 일부 창은 누락 된 데이터로
인해 샘플 수가 적습니다. 동일한 길이의 입력을 생성하기 위해 우리는 60 초 대신 58 초 데이터를
사용하도록 window를 자릅니다. 즉, 모든 입력은 7200보다는 6960 샘플을 사용합니다.
PD 환자의 운동 상태는 웨어러블 센서 데이터를 수집하는 동안 임상 전문가에 의해 1 분 간격으로
분류됩니다. 219 시간의 데이터에서 154 시간의 데이터가 전문가의 직접적인 관찰 하에 표시됩
니다. 보행 및 누워서 하는 활동 중 수집 된 50 시간의 데이터는 제한된 관찰로 인해 교육 데이터에
서 제거됩니다.

Ⅳ. Experiment
A. Data Preparation
증상이 없는 데이터를 제거하여 운동 지연 및 운동 장애 운동 상태의 두 가지 분류 문제로 문제를
단순화합니다.
각 환자의 수집 데이터 범위: 9분 ~ 273분
12명은 bradykinesia and dyskinesia / 13명은 bradykinesia or dyskinesia
공정한 평가를 위해 양쪽 집단의 비교적 큰 표본을 가진 5 명의 환자를 시험 환자로 선택하고 다른
20 명의 환자는 훈련 환자로 선택했음.
결과적으로 20 명의 훈련 환자의 3530 분 데이터 (bradykinesia : 1715, dyskinesia : 1815)가
훈련 데이터로 사용되었고 나머지 5 명의 환자의 1090 분 데이터 (bradykinesia : 442,
dyskinesia : 648)가 테스트로 사용

Ⅳ. Experiment
B. The CNN architecture
RNN이 아닌 CNN을 사용한 이유
= to reduce the number of parameters
7 layer CNN 16-32-64-64-64-64-64 feature map
Convolution size : 4*1, 4*1, 3*1, 3*3, 3*3, 3*3, 3*1
Activation function: ReLu + Softmax(output layer)
Fully connected layers : GAP(global averaging pooling layer) -> to reduce the number of
parameters
*the CNN architectures for the cropping and sampling data augmentation experiments
are 6-layer CNNs, which consist of 32-64-64-64-64-64 feature maps.

Ⅳ. Experiment
방법

Ⅴ. Result
Rotation과 Permutation의 조합이 가장 좋은 성능
Rotation은 센서의 부착 방향이든가, 팔의 포즈에 따른 신호의 다양성(variability)을 상쇄 시켜줌.
Permutation은 1분 크기의 신호를 random N segment로 나눠 그 순서를 랜덤하게 바꿔주는 방
법인데, 이를 통해 "증상은 1분 중 어디에도 나타날 수 있음"을 간략하게 재현.
결과: 76.7%(No augmentation)이던 성능을 92.0%(Rotation and Permutation)로, 약
15% 가량 향상
한계: 정상인 상태 신호의 크기가 imbalance하게 너무 크고 , 다양한 활동을 포함해서 모든 레이
블의 영향을 삼켜버린다.

1.1 페이지 제목
Ⅲ. Explanation
24 / 14
방법

Ⅵ. Conclusion
Automatic classification algorithm for PD motor state monitoring을 개발.
limited data availability, large inter-patient and inter-class variability, noisy labels, and
interference by irrelevant motion signals 때문에 도전적인 과제였음.
하지만 7-layer CNN and the combination of rotational and permutational data
augmentation methods 덕분에 분류 정확도를 76%에서 92%로 올림

Reference
Data Augmentation ofWearable Sensor Data for Parkinson’s Disease Monitoring using Convolutional
Neural Networks (Terry Taewoong Um , 2017 , ICMI2017 )

Data Augmentation of Wearable Sensor Data for Parkinson’s

Recommended

Recommended

More Related Content

Featured

Featured (20)

Data Augmentation of Wearable Sensor Data for Parkinson’s