4. Activity Inference Process
Raw data
Discretization
• MDLP
• LGD
Classifier
construction
• Decision Tree
• Naïve Bayesian
• K-Nearest Neighbor
• SVM
4
5. Activity Inference Process
Raw data
Discretization
•MDLP
•LGD
Feature-Value
selection
•ONEFVAS
•GIFVAS
•CBFVAS
Classifier
construction
•Decision Tree
•Naïve Bayesian
•K-Nearest Neighbor
•SVM
5
6. Feature-Value Selection
• What is feature-value?
• A range of sensor reading
• e.g. Accelerometer magnitude high, GPS at home, light bright
• Why using feature-value?
• Sensor reading relation with activity: relevant or not
• e.g. Accelerometer magnitude value reading
Accelerometer: LowAccelerometer: Low Accelerometer: high 6
9. Iteration-based (GIFVAS)
• Looping on the threshold, selecting feature-values iteratively.
• Evaluating accuracy for each iteration
• If accuracy reduction is big
• Then cancel the selection on this iteration, tag any feature-value
as special
• Any special feature-value will be remained until the last iteration
• Special feature-value
• Frequent but confusing
• Pure but infrequent
0.00%
50.00%
100.00%
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
Entropy Threshold
Accuracy
0.00%
50.00%
100.00%
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
Entropy Threshold
Feature-Value Pairs
9
10. Correlation-based
• Using Pearson correlation in the feature level
• Using entropy in the feature-value level
• For each feature-value pair
• Generate correlated feature-value
• Sort the correlated feature-value using entropy
• Pick only the best-N feature-value from it
• Discard other feature-value
80.00%
85.00%
90.00%
95.00%
100.00%
1 3 5 7 9 11 13 15 17
Best-N feature-value remained
Accuracy
Correlation
Original
350.00 KB
550.00 KB
750.00 KB
950.00 KB
1150.00 KB
1 3 5 7 9 11 13 15 17
Best-N feature-value remained
Model Size
Correlation
Original
10
11. Experiments
• Environments:
• Intel Quad Core 2.66GHz
• RAM 8 GB
• Java 7
• Weka 3.6.11 (all default parameter)
• Datasets:
• Collect from 11 participants
• At least 2 different activities, up to 6 activities
• Average 3 weeks, maximum 2 months
• Classifier Algorithm:
• Naïve Bayesian
• Decision Tree (J48)
• SVM (SMO)
• k-Nearest Neigbor (kNN) 11
12. Experiments (Model Size)
• Feature-value selection is not effective on Naïve Bayesian
• In general, feature-value selection works best on decision tree
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Original ONEFVAS GIFVAS CBFVAS
Model Size (LGD)
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Original ONEFVAS GIFVAS CBFVAS
Model Size (MDLP)
Naïve Bayes
Decision Tree
kNN
SVM
12
15. Conclusions
• Proposed feature-value selection for reducing model size
• ONEFVAS – Using entropy threshold
• GIFVAS – Using iteration on entropy threshold
• CBFVAS – Using correlation and entropy
• Proposed method is able to reduce model size while
maintaining accuracy performance
• Performance varies on discretization and classification algorithms
• Decision Tree gets the most benefit
15
16. Thank you
On Selecting Feature-Value Pairs on Smart Phones for Activity Inferences
Presented by: Gunarto Sindoro Njoo
GUNARTO.NCTU@GMAIL.COM
16
Editor's Notes
[Quick Explanation]
Motivation on activity inference
Helping on giving service by learning the context of the user
[1] Silent, Phone/Message Filtering
[2] Navigations, reading message out
[3] Tips, Review, Recommendations
[Quick Explanation]
Motivation on reducing the storage size for activity classification
Comparing between Computer, Smart phone, and Sensor Hub – Power Consumption
Computer : Hundreds Watts
Smart Phones: Several Watts
Sensor Hubs : mW
Activity inference process in general (Using Supervised learning)
Why we need discretization?
Because some of classification algorithms need interval so that it can work well,
e.g. decision tree, rule based.
In here we consider 2 supervised discretization methods,
using information theory (GINI for LGD and Information Gain for MDLP)
Feature-Value selection is inserted in the middle of discretization and classifier building
We introduce 3 methods to do feature-value selection.
Before going further, we need to explain what is feature value first.
Feature-Value selection’s goal is: reduce model size of classifier because some of the intervals have less meaning.
Why not feature selection ?
Because number of sensors in smart phone are limited and removing one will reduce the accuracy greatly.
We could also address that this approach could be a complement to feature-selection, and could do well in fewer features.
We could select the feature-value using threshold.
By doing so, we could reduce the number of feature-value well.
The problem is how we set the entropy threshold.
Doing iteration in searching for the best threshold.
Even though there are some “special feature-value” added back, the number of feature-value pairs still reduced a lot.
The problem here is that selection process is slow.
By using a big value for N, then the model size is bigger too.
Feature-value are grouped by their togetherness in the datasets:
If feature are correlated
If feature-value are together in the datasets.
Original is “Without feature-value selection”
ONEFVAS is using entropy threshold 0.
LGD and MDLP are discretization methods.
Naïve Bayesian uses matrix to represent the classification model, so the size is not reduced. Only the statistical attributes (mean, stdev, var) are changed.
So on the following slides, we remove Naïve Bayesian and focus on the rest.
On term of model size, ONEFVAS in Decision Tree is the best, but CBFVAS is more stable on most case.
SVM on MDLP couldn’t be run because of some limitation in weka, due to huge number of intervals generated by MDLP discretization.
Because of weka’s limitation on number of intervals, SVM can’t be run.
Based on those graphs, we can see that Decision Tree gets the most benefit (Low model size and Small accuracy reduction)