Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...
Handling Class Overlap and Imbalance to Detect Prompt Situations in Smart Homes
1. Handling Class Overlap and Imbalance
to Detect Prompt Situations in
Smart Homes
Barnan Das, Narayanan C. Krishnan, Diane J. Cook
Barnan Das
School of Electrical Engineering and Computer Science
Washington State University
***Self-portraits by William Utermohlen, an American artist living in London, after he was diagnosed with Alzheimer’s disease in 1995. Utermohlen died from the
consequences of Alzheimer’s disease in March 2007.
2. 36
million
Worldwide Dementia
population
13.2m
Actual and expected
number of Americans >=65
year with Alzheimer’s
7.7m
5.1m
2010
2030
2050
$200
Payment for care in 2012
billion
15
Unpaid caregivers
million
2
Source: World Health Organization and Alzheimer’s Association.
5. Existing Work
• Rule-based (temporal or contextual)
• Activity initiation
• RFID and video-input based prompts for
activity steps
Our Contribution
• Learning-based
• Sub-activity level prompts
• No audio/video input
5
11. Existing Approaches
• Discard data of the overlapping region
• Treat overlapping region as a separate class
• Polynomial combination of existing features
• Using kernel methods
11
14. Two Critical Components
Choice of
Determining
Clustering Algorithm
Candidate Clusters
DBSCAN
Empirically Determined
•
•
•
•
Density-based
Non-spherical clusters
No need to predetermine
number of clusters
•
Based on minority class
dominance (r) in clusters
Threshold determined by qquantile values of r
14
19. Conclusion
• Automated prompting as a classification problem
• Proposed ClusBUS: under-sampling-based preprocessing
• Solution to class overlap helps address imbalance classes
19
24. Feature Generation
Feature # Feature Name
Description
1
stepLength
Length of the step in time (seconds)
2
numSensors
Number of unique sensors involved with the step
3
numEvents
Number of sensor events associated with the step
4
prevStep
Previous step
5
nextStep
Next step
6
timeActBegin
Time (seconds) elapsed since the beginning of the activity
7
timePrevStep
Time (seconds) difference between the last event of the
previous step and the first event of the current step
8
stepsActBegin Number of steps visited since the start of the activity
9
activityID
10
stepID
11
location
12
Class
Activity ID
Step ID
Set of features representing sensor frequencies in kitchen,
dining room, living room, etc. when the activity was
performed
Binary class. 1-”Prompt”, 0-”No-Prompt”
24
Editor's Notes
The background image that you see is a collection of self-portraits by William Utermohlen, an American artist/painter living in London. When diagnosed with Alzheimer’s in 1995 he decided to document his experiences in the form of self portraits as his life progresses with the disease. Utermohlen later dies from the consequences of Alzheimer’s in 2007.
Fewnumbers and stats from World Health Organization and the Alzheimer’s Association that highlights the philosophical motivation of this work:There are currently 36 million people in the world who are suffering from Dementia. In US alone, there were over 5 million Alzheimer’s patients in the year 2010. By 2050, this number has been projected to be as high as 13.2 million. Caring for these Alzheimer’s patients has cost America $200 billion in 2012.Moreover, there are currently 15 million unpaid caregivers, who are usually family members, taking care of these patients. As 45% of the unpaid caregivers are 55 and older, caring for their dear ones causes high levels of emotional stress and depression.
Therefore, there is a growing need for developing assistive living technologies to reduce the burden on the caregivers and help the elderly age in place. Help with Activities of Daily Living is one of the primary objectives and research directions in the area of Smart Environment research.
In my research, I am trying to address Automated Prompting Challenge to help older adults with their daily activities. Specifically, I am addressing the machine learning challenges associated with tracking activity steps from in-home sensor data and predicting potential prompt situations when an elderly performs an activity.
The existing work in the area of automated prompting mainly deals with rule-based prompts for activity initiation. However, there are other works which can handle prompt situations for activity steps by using either RFID tags or video-input.The contribution of the current work is mainly in proposing a machine learning-based prompting system by tracking activity steps and predicting potential prompt situation using the current infrastructure of the group which involves neither RFID tags nor video-input.
A brief architectural overview before we dig deep into the algorithmic challenges:We collect daily human activity data from on-campus and off-campus smart homes which are equipped with a diverse sensor suite which includes motion, object, door, light and power sensors. We also exploit the sensors available on smart phones. The raw sensor data collected from the experiments conducted with human participants are passed along to the human annotators who label the data with daily activities and corresponding steps. These activity labels act as ground truth for evaluating our proposed learning models. Also, the ground-truth information is used to generate distinguishable features on activities and activity steps which are fed into machine learning models to predict prompt situation. And ultimately the predicted prompts could be issued to a smart home inhabitant through a prompting device.My contribution is in engineeringdistinguishable activity features and building machine learning models to predict potential prompt situations for ADLs.
So lets see what the data represents and what is that we are trying to achieve. We are using the sensor data collected from 300 older-adult participants who performed 8 different activities of daily living in our on-campus smart home. An experimenter monitored the participants and issued prompts wherever necessary. For example, in the cooking activity prompts were issued when the participant forgot to heat the water in the microwave which was going to be used to cook a cup of noodles.The raw sensor data are labeled with activities, predefined activity steps and also if a prompt was issued with any specific activity step for a participant. Thus, after the primary preprocessing, a unique activity step for a participant corresponds to one data point. I have engineered 17 different attributes for these activity steps such as frequency of sensor triggering for specific locations, duration of activity step, time elapsed between activity initiation and current step, etc. The goal is thus to classify these steps into prompt and no prompt classes based on ground truth information collected from the experimenter and the annotator. So as you can tell, it is a binary classification problem.
With the features that we generate on the raw data, it sometimes becomes impossible for a classifier to determine if a data point belongs to the prompt class or the no-prompt class. It turns out that, if overlapping classes problem occurs in the presence of imbalanced class distribution, getting rid of class overlap actually reduces the adverse effects of class imbalance to some extent. This problem also exists in other domains such as character recognition, credit card fraud detection, drug design, etc.
The class overlap problem become clear from a 3D PCA plot of the prompting data.
Solutions in the literature either talk about discarding the data points of the overlapping region or treating the overlapping region as a separate class. However, in the prompting domain neither of these approaches is going to work. First, the prompting data has absolute rarity of minority class instances and therefore throwing away data points will make the problem even worse. Secondly, treating the overlapping region as a separate class is not going to solve our purpose of accurate classification of prompt and no-prompt activity steps. Now, you might think if we use a parametric machine learning method and generate new features by some polynomial combination of existing features, it might solve this issue. However, it appears that it doesn’t. So, we take a preprocessing approach to solve overlapping classes
Our solution is motivated from the concept of Tomek links which are defined as pairs of minimally distant minority and majority class data points. This means that Tomek links either represent noise or lie on the boundary of the two classes. And thus removing the majority class instances from the Tomek links can help in better learning the minority class.
Our proposed approach is very similar. However, instead of identifying the overlapping region by finding Tomek links we use clustering. Cluster-Based Under-Sampling or ClusBUS, first identifies the overlapping region in the data by performing clustering. The clusters which have a good mix of both minority and majority class samples are considered for under-sampling. From these clusters, the majority class samples are removed. This creates a void around the minority class samples and thus helps in learning them better.