1. Lifestreams
A modular sense-making toolset for identifying
important patterns from everyday life
Cheng-Kang (Andy) Hsieh
Hongsuda Tangmunarunkit
Faisal Alquaddoomi
John Jenkins
Jinha Kang
Cameron Ketcham
Brent Longstaff
Joshua Selsky
Betta Dawson
Dallas Swendeman
Deborah Estrin
Nithya Ramanathan
2. Health Concerns of After-
Pregnancy Moms
• Obesity
• Diabetes
• Depression
• Cardiovascular Diseases
• Suggested Strategy: Make
behavior changes when
children are young.
* Sattar, Naveed, and Ian A. Greer. "Pregnancy complications and maternal cardiovascular risk: opportunities for intervention and
screening?." British Medical Journal 2002.
* Gunderson, Erica P., and Barbara Abrams. "Epidemiology of gestational weight changes after pregnancy." Epidemiologic reviews, 2002.
* Nielsen, D., et al. "Postpartum depression: identification of women at risk."BJOG: An International Journal of Obstetrics &
Gynaecology 107.10, 2000.
3. Use mobile Health (mHealth) data to
provide personalized intervention
Photo: Marshall Astor, WWW
Personalized Intervention
Data Processing
(transform, cluster, infer, viz)
(smart apps, coaches)
4. Main Contributions
• Developed Lifestreams pipeline and an initial
set of modules that identified health-related
trends and patterns.
• Applied Lifestreams to a real-world mHealth
study.
• Validated the analysis results through the
interviews with the study participants.
• Evaluated the extensibility of Lifestreams by
applying it to two additional studies.
5. Pilot Field Study: Mom’s Study
• 44 young mothers monitor diet,
stress, and exercise for 6 months.
Dataset:
•Self-Reports
•On average, 2 self-reports per mom
per day.
•Passive Data Streams
•GPS data
•Ambient Wi-Fi signals
•Phone’s accelerometry
•Totaling 14GB of data. ohmage: a mobile data
collection tool
6. Use Lifestreams to analyze Moms’
data.
• How are moms progressing in their weight control?
• What is the cause of an undesirable behavior?
• What are major life events that affect moms’
behaviors?
• Can these be automatically detected to trigger some
action?
8. Raw Passive Data Streams
GPS Data
AccelerometryAmbient Wi-Fi Signals
• Collect every 1 or 5 minutes.
• Over 300MB per mom.
• Noisy and hard to analyze.
10. Lifestreams
• Modular and extensible data processing
pipeline.
• Allows researchers and app developers
to plugin their best tools and build their
apps on top of existing modules.
• An initial set of Lifestreams modules
was developed for Mom’s study to
answer health-related questions.
Raw mHealth
data
Feature
Extraction
Feature
Selection
Inference
Visualization
13. Transform Raw GPS
& Wi-Fi Data
into Meaningful
Location Features
(home, work place, lunch
place, coffee shop, school,
grocery etc.)
Location Feature Extraction
Module
14. DBSCAN (A Spatial Clustering
Algorithm)
Time complexity: O(n2), where n is number of data
points. In Mom’s study, n > 200,000.
One-day location trace of a mom:
Data points are clustered into
geo-areas of meaningful places.
Home Lunch place
Work place
15. Proposed Algorithm:
Two-Phase DBSCAN
O(K2D+M2) K : # of data points in a day
D: # of days
M: # of places extracted in Phase I
O(K2+M2)
(if run incrementally)
Time complexity:
Location Traces DBSCAN
DBSCAN
DBSCAN
DBSCAN
…
DBSCAN
Day 1
Day 2
Day …
Day N-1
Day N
DBSCAN
L*
1,1 L*
1,2 …
L*
2,1 L*
2,2 …
L*
N-1,1 L*
N-1,2 …
L*
N,1 L*
N,2 …
L1 L2 L3 …
L1 L2 L3 L4 …
Our Approach: Two-Phase DBSCAN
Location Trace
Phase 1 Phase 2
18. • Over 140 features were automatically extracted for each
Mom.
• Perform feature selection to select relevant features.
• Manual selection by domain knowledge.
• Semi-auto selection assisted by statistical methods.
Feature Selection
Mutual Information
20. Change Detection
• Use: Identify changes in behaviors related to diet,
stress, exercise.
• Method: A statistical change detection algorithm is
used to identify change points in behavioral
features.
21. 1. Assume that we are now at time t.
2
3
4
5
0 50 100 150
time t
Statistical Change Detection Algorithm
Daily Stress Level of A Mom
22. 2
3
4
5
0 50 100 150
splitting points
time t
2. Iteratively analyze all the possible splitting points
Statistical Change Detection Algorithm
Daily Stress Level of A Mom
23. 2
3
4
5
0 50 100 150
splitting point S
time t
3. If a splitting point S make the difference between the
two subsets of data exceed a threshold.
Statistical Change Detection Algorithm
Daily Stress Level of A Mom
24. 2
3
4
5
0 50 100 150
Change point S.
time t
4. Consider S as a change point, discard previous data,
and restart the algorithm
Statistical Change Detection Algorithm
Daily Stress Level of A Mom
25. Measure Distribution Difference
Between Different Parts of Data
• Human behavior is not assumed to be normally
distributed.
• Mann-Whitney’s U is preferred over Student’s T
• Make no assumption about the distribution.
• Based on ranks of observations rather than absolute
values.
2
3
4
5
0 50 100 150
splitting point S
time t
26. Daily Stress Level
Daily Eating Quality
Stress Caused by Health
Walking Time (Minutes)
Driving Distance (Miles)
2
3
4
5
2
3
4
5
0
1
2
3
0
30
60
90
0
50
100
150
Apr Jul Oct
Change Point Weight Watcher Progra
May Jun Aug Sep
(# of Reports)
Change Detection Results
• Changes are
detected in diet,
stress behaviors.
• Results were
validated through
the interview.
Daily Stress Level
Daily Eating Quality
Stress Caused by Health
Walking Time (Minutes)
Driving Distance (Miles)
2
3
4
5
2
3
4
5
0
1
2
3
0
30
60
90
0
50
100
150
Apr Jul Oct
Change Point Weight Watcher Progra
May Jun Aug Sep
Change Point
Detection Time
Weight Watcher
Daily Stress Level
Daily Eating Quality
Stress Caused by Health
Walking Time (Minutes)
2
3
4
5
2
3
4
5
0
1
2
3
Change Point Weight Watcher Progra
Daily Stress Level
Daily Eating Quality
Stress Caused by Health
Walking Time (Minutes)
Driving Distance (Miles)
2
3
4
5
2
3
4
5
0
1
2
3
0
30
60
90
0
50
100
150
Apr Jul Oct
Change Point Weight Watcher Progra
May Jun Aug Sep
27. Correlation Analysis Module
• Use: Identify patterns of an undesirable behavior.
• Method: Five different measurements are used to
measure the correlation between different types of
behavioral features.
Table 1. Five different measures used to compute correlation
coefficients between quantitative, ordinal, and nominal
features.
Quantitative Ordinal Nominal
Quantitative Pearson’s r Spearman’s rho Point Biserial rpb
Ordinal Spearman’s rho Spearman’s rho Rank Bisereal rrb
Nominal Point Biserial rpb Rank Bisereal rrb Phi
Correlation measurements for different data types
28. 0.370.18-0.330.4500-0.280.38
0.45-0.410.600-0.360.44
-0.230.5600-0.320.24
-0.49000.55-0.51
00.27-0.610.72
000
-0.340.32
-0.83
Stress Level in Late Afternoon
Stress Level in Mid Day
Stress Level in Morning
Timefor Self
Stress Caused by Finances
Stress Caused by Traffic
Overall Stress Level
Time in New House (Place 2)
Time in Old House (Place 1)
e Arrive Work
edule Variance
Correlation Matrix for
Stress-Related Pattern
• Places vs. Stress
• Time for self vs. Stress
Positive
Negative
0-0.220000-0.220000
0.370.18-0.330.4500-0.280.3800.33
0.45-0.410.600-0.360.4400
-0.230.5600-0.320.2400
-0.49000.55-0.5100
00.27-0.610.7200.28
00000
-0.340.3200.31
-0.83-0.28-0.38
0.350.45
0.45
Reported Exercise Time
StressLevelinLateAfternoon
StressLevelin MidDay
StressLevelinMorning
TimeforSelf
StressCausedby Finances
StressCausedbyTraffic
Overall Stress Level
TimeinNew House (Place 1)
Time in OldHouse (Place 2)
Time Arrive Work
Work ScheduleVariance
-
Correlation
Coefficient
29. Correlation Change Detection
• Use: Identify changes in behavior patterns.
• Method: Similar to single-feature change detection,
but it measures the difference between the
correlations of two features of different subsets of
data.
30. Correlation Change Detection
Result
• Reversed correlation between exercise and
stress.
• Not revealed by the overall correlation.
Cor: -0.07Cor: 0.68
var1
var2
0.00
0.25
0.50
0.75
1.00
2.5
3.0
3.5
4.0
Apr Jul Oct
Cor: -0.58 Cor: -0.45
Cor: 0.71
0
10
20
30
40
50
2.5
3.0
3.5
4.0
Apr May Jun Jul Aug Sep Oct
ExerciseTime (Minutes)( )
DailyOverallStress
Overall Correlation: 0.10
Cor: -0.07Cor: 0.68
0.00
0.25
0.50
0.75
1.00
2.5
3.0
3.5
4.0
Apr Jul Oct
ExerciseWithChild
DailyMaximunStress
OverallCorrelation: 0.32
31. Extending Lifestreams
• Family Wellness Study:
• Study the interactions among family
members using self-reports and acoustic
data.
• Mobilize (participatory sensing
exercise for high school students)
• Study relation between the students’
attachment to phones and their
participation to the study.
• More Potential Use Cases
• PTSD, chronic pain, inflammatory, depression, insomnia,
trauma, asthma and more ...
32. Limitations and Challenges
• Missing data problem
• Need robust statistical techniques.
• More engaging mobile use experience.
• Difficulty in validation
• Interview data is insufficient.
• Expert knowledge will help.
• Too much complicated Information
• Need innovations in UI & data analysis to make
information more accessible for busy users.
33. Lifestreams is an open source
software
• We invite people to work with us to make sense of
data and to together, fulfill the promises of mHealth.
• Lifestreams: https://github.com/changun/Lifestreams
• Ohmage: https://github.com/ohmage/