Machine Learning Challenges
for Automated Prompting in
Smart Homes
Barnan Das
May 22, 2014
2
2009 2030
Older adult (65+) population
in US
72mn
40mn
3
5million
15million
60%
Alzheimer’s
patient
Unpaid
caregivers
Caregivers
report stress
4
5
Machine learning algorithms
trained on smart home sensor data
can predict when an individual
faces difficulty while performing
everyday activities.
6
7
Smart Home Studies
Study 1 Study 2
Participants 400 180
Activities 8 6
Activity Errors Naturalistic Naturalistic
8
Automated Prompting
Emulating Caregiver
Prompt Timing
Detecting Activity Errors
in Real Time
Imbalanced Class
Distribution
Class Overlap
One-Class
Classification
Overview
Study 1 Study 1, 2
9
Emulating Caregiver Prompt Timing
8Daily
Activities
Study 1
Prompts issued
when errors were
committed
Raw Data
1Activity
Step
17 Engineered
Features
Used by Algorithms
0/1
1Training
Exampl
e
Binary class
{prompt, no-prompt}
10
Total # training
examples
3980
3.94%
Class Distribution
prompt
class
11
Automated Prompting
Emulating Caregiver
Prompt Timing
Detecting Activity Errors
in Real Time
Imbalanced Class
Distribution
Class Overlap
One-Class
Classification
Overview
12
Imbalanced Class Distribution
13
Preprocessing
Sampling
• Over-sampling the minority class
• Under-sampling the majority class
Oversampling
• Spatial location of training examples in
Euclidean space
Existing Solutions
14
Preprocessing technique to oversample minority class
Approximate discrete
probability distribution using
Generate new minority
class data points using
Chow-Liu’s algorithm Gibbs sampling
Proposed Approach
17
Minority
Class Samples
Majority
Class Samples
Markov Chains
Gibbs Sampling
18
(wrapper-based) RApidly COnverging Gibbs Sampler
RACOG wRACOG
Sample selection
Pre-defined lag on
Markov chain
Highest probability of
misclassification by
wrapper classifier
Stopping criteria
Pre-defined number of
iterations
No improvement of a
performance measure
RACOG & wRACOG
19
Experimental Setup
Datasets Approaches Classifiers
Study 1 (Prompting) Baseline Classifier C4.5 Decision Tree
9 UCI Datasets SMOTE SVM
SMOTEBoost K-Nearest Neighbor
RUSBoost Logistic Regression
Baseline Prompting
RACOG
wRACOG
20
Results (True Positive Rate)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
21
Results (G-mean)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
22
Automated Prompting
Emulating Caregiver
Prompt Timing
Detecting Activity Errors
in Real Time
Imbalanced Class
Distribution
Class Overlap
One-Class
Classification
Overview
23
Class Overlap
24
Class Overlap in Prompting Data
3-dimensional PCA plot of prompting data
25
Tomek Links
26
Form clusters Under-sampling
clusters
Cluster-Based Under-Sampling
27
ClusBUS Ensemble
28
Experimental Setup
Dataset Approaches Classifiers
Study 1 (Prompting) Baseline C4.5 Decision Tree
SMOTE Naive Bayes
Clustering Algorithm ClusBUS K-Nearest Neighbor
DBSCAN ClusBUS Ensemble SVM
29
Result (True Positive Rate)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
C4.5 Naïve Bayes IBk SMO
Baseline SMOTE ClusBUS ClusBUS Ensemble
30
Result (G-mean)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
C4.5 Naïve Bayes IBk SMO
Baseline SMOTE ClusBUS ClusBUS Ensemble
31
Automated Prompting
Emulating Caregiver
Prompt Timing
Detecting Activity Errors
in Real Time
Imbalanced Class
Distribution
Class Overlap
One-Class
Classification
Overview
32
Detecting Activity Errors in Real Time
Sensor events labeled with
activity steps
Availability of information on
activity errors
33
Basic Idea
Participants with no
reported errors
One-Class Classifier
Participants who
committed errors
Normal
Activity Data
Train Test
Activity Data
with Errors
Activity Data
34
6Daily
Activities
Participants
Annotated for
error start times
Raw Data
1Sensor
Event
>70 Engineered
Features
1
1Training
Exampl
e
One-class
{normal}
Used by Algorithms
580
DERT Data
35
One-Class SVM
x
1
x2
36
Model Selection
37
Activity Error Classification
WHY? To characterize change in daily activities of
older adults
HOW? Sensor data
Error Types Accuracy*
Study 1 4 73%
Study 2 9 54%
*Using C4.5 decision tree and 10-fold CV
41
Activity Error Models
One-Class Multi-Class
42
Ensembles
One-Class SVM
Test Sample
Error Model
One-Class Multi-Class
Logical AND
Normal/Error
43
Experimental Setup
Datasets Approaches
Study 1 (400 participants) Baseline
Study 2 (180) participants OCSVM
OCSVM + OCEM
OCSVM + MCEM
44
Results: Study 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Sweeping
and Dusting
Taking
Medication
Watering
Plants
Cooking
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Sweeping
and Dusting
Taking
Medication
Watering
Plants
Cooking
Recall Precision
Baseline OCSVM OCSVM+OCEM OCSVM+MCEM
45
Results: Study 2
Recall Precision
Baseline OCSVM OCSVM+OCEM OCSVM+MCEM
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Sweeping
and Dusting
Cleaning
Countertops
Taking
Medication
Watering
Plants
Washing
Hands
Cooking
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Sweeping
and Dusting
Cleaning
Countertops
Taking
Medication
Watering
Plants
Washing
Hands
Cooking
46
Clinical Evaluation
18%
Continuation of
Previous error
Actually
True Positives
33%
• Evaluation of algorithm-predicted false
positives
• Psychology clinician looked at participant’s
videos
• Emulate caregiver
intervention.
• Class imbalance
and overlap.
• Detect activity
errors in real-time.
47
Conclusion
• Validated primary
hypothesis.
• Foundation of a
real-world
prompting system.
• RACOG and
wRACOG for
continuous values.
• ClusBUS in other
domains.
• Precise annotation
for activity errors.
Summary Significance
Future
Work
48
Publications
Book Chapter Journal
B. Das, N.C. Krishnan, D.J. Cook, “Handling Imbalanced
and Overlapping Classes in Smart Environments
Prompting Dataset”, Spinger book on Big Data, 2014.
B. Das, N.C. Krishnan, D.J. Cook, “Real-Time Activity
Error Prediction to Assist Older Adults in Smart Homes:
An Outlier Detection-Based Approach”, AI in Medicine,
2014. (Submitted)
B. Das, N.C. Krishnan, D.J. Cook, “Automated Activity
Intervention to Assist with Activities of Daily Living”, IOS
Press book on Agent-Based Approaches to Ambient
Intelligence, 2012.
B. Das, N.C. Krishnan, D.J. Cook, “RACOG and
wRACOG: Two Probabilistic Oversampling Techniques”,
IEEE Transaction of Knowledge and Data Engineering,
2014.
A.M. Seelye, M. Schmitter-Edgecombe, B. Das, D.J.
Cook, “Application of cognitive rehabilitation theory to the
development of smart prompting technologies”, IEEE
Reviews in Biomedical Engineering, 2012.
B. Das, D.J. Cook, M. Schmitter-Edgecombe, A.M.
Seelye, “PUCK: An Automated Prompting System for
Smart Environments”, Journal on Personal and Ubiquitous
Computing, 2012.
49
Publications
Conference Workshop
B. Das, N.C. Krishnan, D.J. Cook, “wRACOG: A Gibbs
Sampling-Based Oversampling Technique”, International
Conference on Data Mining, 2013.
B. Das, N.C. Krishnan, D.J. Cook, “Handling Imbalanced
and Overlapping Classes in Smart Environments, ICDM
Workshop in Data Mining in Bioinformatics and
Healthcare, 2013.
S. Dernbach, B. Das, N.C. Krishnan, B.L. Thomas, D.J.
Cook, “Simple and Complex Activity Recognition Through
Smart Phones”, International Conference on Intelligence
Environments, 2012.
B. Das, A.M. Seelye, B.L. Thomas, D.J. Cook, L.B.
Holder, “Using Smart Phones for Context-Aware
Prompting in Smart Environments”, International
Workshop on Consumer eHealth Platforms, Services and
Applications, 2012.
B. Das, C. Chen, A.M. Seelye, D.J. Cook, “An Automated
Prompting System for Smart Environments”, International
Conference on Smart Homes and Health Telematics,
2011.
B. Das, D.J. Cook, “Data Mining Challenges in Automated
Prompting Systems”, Interactions with Smart Objects
Workshop, 2011.
E. Nazerfard, B. Das, L.B. Holder, D.J. Cook, “Conditional
Random Fields for Activity Recognition in Smart
Environments”, ACM Symposium on Human Informatics,
2010.
B. Das, C. Chen, N. Dasgupta, D.J. Cook, “Automated
Prompting in Smart Home Environment”, ICDM Workshop
on Data Mining Services, 2010.
C. Chen, B. Das, D.J. Cook, “A Data Mining Framework
for Activity Recognition in Smart Environments”,
International Conference on Intelligent Environments,
2010.
C. Chen, B. Das, D.J. Cook, “Energy Prediction Using
Resident’s Activity”, International Workshop on Knowledge
Discovery from Sensor Data, 2010.
50
Acknowledgement
Dr. Diane Cook Prafulla Dawadi Adri Seelye
Dr. Larry Holder Dr. Ehsan Nazerfard Carolyn Parsey
Dr. Narayanan C. Krishnan (CK) Dr. Kyle Feuz Christa Simon
Dr. Maureen Schmitter-Edgecombe Brian Thomas Alyssa Weakley
Dr. Behrooz Shirazi Chris Cain Jennifer Williams
Dr. Alex Mihailidis Shirin Shahsavand
Dr. Aaron Crandall
Dr. Hassan Ghasemzadeh
And, all previous colleagues, collaborators and friends…
51

Machine Learning Challenges For Automated Prompting In Smart Homes

  • 1.
    Machine Learning Challenges forAutomated Prompting in Smart Homes Barnan Das May 22, 2014
  • 2.
    2 2009 2030 Older adult(65+) population in US 72mn 40mn
  • 3.
  • 4.
  • 5.
    5 Machine learning algorithms trainedon smart home sensor data can predict when an individual faces difficulty while performing everyday activities.
  • 6.
  • 7.
    7 Smart Home Studies Study1 Study 2 Participants 400 180 Activities 8 6 Activity Errors Naturalistic Naturalistic
  • 8.
    8 Automated Prompting Emulating Caregiver PromptTiming Detecting Activity Errors in Real Time Imbalanced Class Distribution Class Overlap One-Class Classification Overview Study 1 Study 1, 2
  • 9.
    9 Emulating Caregiver PromptTiming 8Daily Activities Study 1 Prompts issued when errors were committed Raw Data 1Activity Step 17 Engineered Features Used by Algorithms 0/1 1Training Exampl e Binary class {prompt, no-prompt}
  • 10.
  • 11.
    11 Automated Prompting Emulating Caregiver PromptTiming Detecting Activity Errors in Real Time Imbalanced Class Distribution Class Overlap One-Class Classification Overview
  • 12.
  • 13.
    13 Preprocessing Sampling • Over-sampling theminority class • Under-sampling the majority class Oversampling • Spatial location of training examples in Euclidean space Existing Solutions
  • 14.
    14 Preprocessing technique tooversample minority class Approximate discrete probability distribution using Generate new minority class data points using Chow-Liu’s algorithm Gibbs sampling Proposed Approach
  • 15.
  • 16.
    18 (wrapper-based) RApidly COnvergingGibbs Sampler RACOG wRACOG Sample selection Pre-defined lag on Markov chain Highest probability of misclassification by wrapper classifier Stopping criteria Pre-defined number of iterations No improvement of a performance measure RACOG & wRACOG
  • 17.
    19 Experimental Setup Datasets ApproachesClassifiers Study 1 (Prompting) Baseline Classifier C4.5 Decision Tree 9 UCI Datasets SMOTE SVM SMOTEBoost K-Nearest Neighbor RUSBoost Logistic Regression Baseline Prompting RACOG wRACOG
  • 18.
    20 Results (True PositiveRate) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  • 19.
  • 20.
    22 Automated Prompting Emulating Caregiver PromptTiming Detecting Activity Errors in Real Time Imbalanced Class Distribution Class Overlap One-Class Classification Overview
  • 21.
  • 22.
    24 Class Overlap inPrompting Data 3-dimensional PCA plot of prompting data
  • 23.
  • 24.
  • 25.
  • 26.
    28 Experimental Setup Dataset ApproachesClassifiers Study 1 (Prompting) Baseline C4.5 Decision Tree SMOTE Naive Bayes Clustering Algorithm ClusBUS K-Nearest Neighbor DBSCAN ClusBUS Ensemble SVM
  • 27.
    29 Result (True PositiveRate) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 C4.5 Naïve Bayes IBk SMO Baseline SMOTE ClusBUS ClusBUS Ensemble
  • 28.
    30 Result (G-mean) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 C4.5 NaïveBayes IBk SMO Baseline SMOTE ClusBUS ClusBUS Ensemble
  • 29.
    31 Automated Prompting Emulating Caregiver PromptTiming Detecting Activity Errors in Real Time Imbalanced Class Distribution Class Overlap One-Class Classification Overview
  • 30.
    32 Detecting Activity Errorsin Real Time Sensor events labeled with activity steps Availability of information on activity errors
  • 31.
    33 Basic Idea Participants withno reported errors One-Class Classifier Participants who committed errors Normal Activity Data Train Test Activity Data with Errors Activity Data
  • 32.
    34 6Daily Activities Participants Annotated for error starttimes Raw Data 1Sensor Event >70 Engineered Features 1 1Training Exampl e One-class {normal} Used by Algorithms 580 DERT Data
  • 33.
  • 34.
  • 35.
    37 Activity Error Classification WHY?To characterize change in daily activities of older adults HOW? Sensor data Error Types Accuracy* Study 1 4 73% Study 2 9 54% *Using C4.5 decision tree and 10-fold CV
  • 36.
  • 37.
    42 Ensembles One-Class SVM Test Sample ErrorModel One-Class Multi-Class Logical AND Normal/Error
  • 38.
    43 Experimental Setup Datasets Approaches Study1 (400 participants) Baseline Study 2 (180) participants OCSVM OCSVM + OCEM OCSVM + MCEM
  • 39.
    44 Results: Study 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Sweeping andDusting Taking Medication Watering Plants Cooking 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Sweeping and Dusting Taking Medication Watering Plants Cooking Recall Precision Baseline OCSVM OCSVM+OCEM OCSVM+MCEM
  • 40.
    45 Results: Study 2 RecallPrecision Baseline OCSVM OCSVM+OCEM OCSVM+MCEM 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Sweeping and Dusting Cleaning Countertops Taking Medication Watering Plants Washing Hands Cooking 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 Sweeping and Dusting Cleaning Countertops Taking Medication Watering Plants Washing Hands Cooking
  • 41.
    46 Clinical Evaluation 18% Continuation of Previouserror Actually True Positives 33% • Evaluation of algorithm-predicted false positives • Psychology clinician looked at participant’s videos
  • 42.
    • Emulate caregiver intervention. •Class imbalance and overlap. • Detect activity errors in real-time. 47 Conclusion • Validated primary hypothesis. • Foundation of a real-world prompting system. • RACOG and wRACOG for continuous values. • ClusBUS in other domains. • Precise annotation for activity errors. Summary Significance Future Work
  • 43.
    48 Publications Book Chapter Journal B.Das, N.C. Krishnan, D.J. Cook, “Handling Imbalanced and Overlapping Classes in Smart Environments Prompting Dataset”, Spinger book on Big Data, 2014. B. Das, N.C. Krishnan, D.J. Cook, “Real-Time Activity Error Prediction to Assist Older Adults in Smart Homes: An Outlier Detection-Based Approach”, AI in Medicine, 2014. (Submitted) B. Das, N.C. Krishnan, D.J. Cook, “Automated Activity Intervention to Assist with Activities of Daily Living”, IOS Press book on Agent-Based Approaches to Ambient Intelligence, 2012. B. Das, N.C. Krishnan, D.J. Cook, “RACOG and wRACOG: Two Probabilistic Oversampling Techniques”, IEEE Transaction of Knowledge and Data Engineering, 2014. A.M. Seelye, M. Schmitter-Edgecombe, B. Das, D.J. Cook, “Application of cognitive rehabilitation theory to the development of smart prompting technologies”, IEEE Reviews in Biomedical Engineering, 2012. B. Das, D.J. Cook, M. Schmitter-Edgecombe, A.M. Seelye, “PUCK: An Automated Prompting System for Smart Environments”, Journal on Personal and Ubiquitous Computing, 2012.
  • 44.
    49 Publications Conference Workshop B. Das,N.C. Krishnan, D.J. Cook, “wRACOG: A Gibbs Sampling-Based Oversampling Technique”, International Conference on Data Mining, 2013. B. Das, N.C. Krishnan, D.J. Cook, “Handling Imbalanced and Overlapping Classes in Smart Environments, ICDM Workshop in Data Mining in Bioinformatics and Healthcare, 2013. S. Dernbach, B. Das, N.C. Krishnan, B.L. Thomas, D.J. Cook, “Simple and Complex Activity Recognition Through Smart Phones”, International Conference on Intelligence Environments, 2012. B. Das, A.M. Seelye, B.L. Thomas, D.J. Cook, L.B. Holder, “Using Smart Phones for Context-Aware Prompting in Smart Environments”, International Workshop on Consumer eHealth Platforms, Services and Applications, 2012. B. Das, C. Chen, A.M. Seelye, D.J. Cook, “An Automated Prompting System for Smart Environments”, International Conference on Smart Homes and Health Telematics, 2011. B. Das, D.J. Cook, “Data Mining Challenges in Automated Prompting Systems”, Interactions with Smart Objects Workshop, 2011. E. Nazerfard, B. Das, L.B. Holder, D.J. Cook, “Conditional Random Fields for Activity Recognition in Smart Environments”, ACM Symposium on Human Informatics, 2010. B. Das, C. Chen, N. Dasgupta, D.J. Cook, “Automated Prompting in Smart Home Environment”, ICDM Workshop on Data Mining Services, 2010. C. Chen, B. Das, D.J. Cook, “A Data Mining Framework for Activity Recognition in Smart Environments”, International Conference on Intelligent Environments, 2010. C. Chen, B. Das, D.J. Cook, “Energy Prediction Using Resident’s Activity”, International Workshop on Knowledge Discovery from Sensor Data, 2010.
  • 45.
    50 Acknowledgement Dr. Diane CookPrafulla Dawadi Adri Seelye Dr. Larry Holder Dr. Ehsan Nazerfard Carolyn Parsey Dr. Narayanan C. Krishnan (CK) Dr. Kyle Feuz Christa Simon Dr. Maureen Schmitter-Edgecombe Brian Thomas Alyssa Weakley Dr. Behrooz Shirazi Chris Cain Jennifer Williams Dr. Alex Mihailidis Shirin Shahsavand Dr. Aaron Crandall Dr. Hassan Ghasemzadeh And, all previous colleagues, collaborators and friends…
  • 46.

Editor's Notes

  • #3 As more individuals cross higher life expectancy thresholds, a large section of the older adult population is becoming susceptible to cognitive impairments such as Alzheimer’s disease and dementia.
  • #5 One key service that caregivers provide is prompting individuals with memory limitations to initiate and complete daily activities. Therefore, there is a growing need for developing assistive living technologies to help older adults with their daily activities and thus reducing the burden on the caregivers.