20141030 Feature-Value DSAA 2014 V6_FINAL

On Selecting Feature-Value
Pairs on Smart Phones
for Activity Inferences
Gunarto Sindoro Njoo,
Yu-Hsiang Peng,
Kuo-Wei Hsu,
Wen-Chih Peng

Introduction
• Smart phone is getting smarter and smarter
2

Introduction
• Computer
• RAM: 2~8 GB on average
• Storage: >500 GB
• Power: Hundreds Watts
• Smartphone
• RAM: 512 MB~3 GB
• Storage: >4 GB
• Power: a few Watts
• Sensor Hub
• RAM: 16~64 KB
• Storage: 64 KB~256 KB
• Power: mW
3

Activity Inference Process
Raw data
Discretization
• MDLP
• LGD
Classifier
construction
• Decision Tree
• Naïve Bayesian
• K-Nearest Neighbor
• SVM
4

Activity Inference Process
Raw data
Discretization
•MDLP
•LGD
Feature-Value
selection
•ONEFVAS
•GIFVAS
•CBFVAS
Classifier
construction
•Decision Tree
•Naïve Bayesian
•K-Nearest Neighbor
•SVM
5

Feature-Value Selection
• What is feature-value?
• A range of sensor reading
• e.g. Accelerometer magnitude high, GPS at home, light bright
• Why using feature-value?
• Sensor reading relation with activity: relevant or not
• e.g. Accelerometer magnitude value reading
Accelerometer: LowAccelerometer: Low Accelerometer: high 6

FEATURE-VALUE METHODS
One-Cut
Iteration-based
Correlated-based
7

One-Cut (ONEFVAS)
• Entropy-based selection using threshold
Sensor Value Entropy
Accel Avg. 9.75 0.75
Accel Avg. 12 0.1
Accel Avg. 15 0
GPS Speed 0~3 0.95
GPS Speed 3~10 0.2
TimeDay Morning 0.61
TimeDay Afternoon 0.88
TimeDay Evening 0.37
TimeDay Night 0.23
Extracted Sensor Value Pair
Accel Avg. 9.75 0.75
Accel Avg. 12 0.1
Accel Avg. 15 0
GPS Speed 0~3 0.95
GPS Speed 3~10 0.2
TimeDay Morning 0.61
TimeDay Afternoon 0.88
TimeDay Night 0.23
Accel Avg. 12 0.1
GPS Speed 3~10 0.2
TimeDay Night 0.23
Entropy <= 0.5
8

Iteration-based (GIFVAS)
• Looping on the threshold, selecting feature-values iteratively.
• Evaluating accuracy for each iteration
• If accuracy reduction is big
• Then cancel the selection on this iteration, tag any feature-value
as special
• Any special feature-value will be remained until the last iteration
• Special feature-value
• Frequent but confusing
• Pure but infrequent
0.00%
50.00%
100.00%
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
Entropy Threshold
Accuracy
0.00%
50.00%
100.00%
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
Entropy Threshold
Feature-Value Pairs
9

Correlation-based
• Using Pearson correlation in the feature level
• Using entropy in the feature-value level
• For each feature-value pair
• Generate correlated feature-value
• Sort the correlated feature-value using entropy
• Pick only the best-N feature-value from it
• Discard other feature-value
80.00%
85.00%
90.00%
95.00%
100.00%
1 3 5 7 9 11 13 15 17
Best-N feature-value remained
Accuracy
Correlation
Original
350.00 KB
550.00 KB
750.00 KB
950.00 KB
1150.00 KB
1 3 5 7 9 11 13 15 17
Best-N feature-value remained
Model Size
Correlation
Original
10

Experiments
• Environments:
• Intel Quad Core 2.66GHz
• RAM 8 GB
• Java 7
• Weka 3.6.11 (all default parameter)
• Datasets:
• Collect from 11 participants
• At least 2 different activities, up to 6 activities
• Average 3 weeks, maximum 2 months
• Classifier Algorithm:
• Naïve Bayesian
• Decision Tree (J48)
• SVM (SMO)
• k-Nearest Neigbor (kNN) 11

Experiments (Model Size)
• Feature-value selection is not effective on Naïve Bayesian
• In general, feature-value selection works best on decision tree
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Original ONEFVAS GIFVAS CBFVAS
Model Size (LGD)
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Model Size (MDLP)
Naïve Bayes
Decision Tree
kNN
SVM
12

Experiments (LGD)
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Model Size (LGD)
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Accuracy (LGD)
Decision Tree
kNN
SVM
• ONEFVAS gives the biggest saving on model size, but accuracy is low
• Most stable accuracy is on CBFVAS, while reducing model size well
13

Experiments (MDLP)
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Model Size (MDLP)
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Accuracy (MDLP)
Decision Tree
kNN
• Accuracy are more stable, while reductions on model size are well
• In most cases, decision tree gets the most benefit.
14

Conclusions
• Proposed feature-value selection for reducing model size
• ONEFVAS – Using entropy threshold
• GIFVAS – Using iteration on entropy threshold
• CBFVAS – Using correlation and entropy
• Proposed method is able to reduce model size while
maintaining accuracy performance
• Performance varies on discretization and classification algorithms
• Decision Tree gets the most benefit
15

Thank you
On Selecting Feature-Value Pairs on Smart Phones for Activity Inferences
Presented by: Gunarto Sindoro Njoo
GUNARTO.NCTU@GMAIL.COM
16

20141030 Feature-Value DSAA 2014 V6_FINAL

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (8)

Similar to 20141030 Feature-Value DSAA 2014 V6_FINAL

Similar to 20141030 Feature-Value DSAA 2014 V6_FINAL (20)

20141030 Feature-Value DSAA 2014 V6_FINAL

Editor's Notes