1/35
UCAmI
2015
DeustoTech-Deusto Institute of Technology, University of Deusto
http://www.morelab.deusto.es
December 2, 2015
Facing up social activity recognition using smartphone sensors
Pablo Curiel,Ivan Pretel, AnaB. Lago
2/35
UCAmI
2015
Outline
Introduction
System Design
Evaluation
Conclusion
3/35
UCAmI
2015
Introduction
System Design
Evaluation
Conclusion
4/35
UCAmI
2015
Introduction
Introduction
AT HOME
.
.
.
5/35
UCAmI
2015
Introduction
.
.
.
6/35
UCAmI
2015
Introduction
► Location-based services
► Foursquare, Twitter, Google Keep,…
► Low-level inference
► Physical activity: walking, running, cycling,…
► High-level inference
► High-level user activities: cooking, reading novel,…
► Environments or surroundings: home, bar, public transport
.
.
.
7/35
UCAmI
2015
Introduction
► Socialization as a high-level useractivity
► based on environmentrecognition
► provides “social reminders”
.
.
.
@
@
@
8/35
UCAmI
2015
Introduction
System Design
Evaluation
Conclusion
9/35
UCAmI
2015
System Design: Context capture
► Environments
► Bar, café, sports bar, disco and restaurant
► Characteristics
► Noisy places
► Stationary positions
► Artificially lighted places
10/35
UCAmI
2015
System Design: Context capture
► Captured Data
► Audio
► RMSpoweranddBs
► Microphone
► Acceleration
► 3-axialacceleration
► Acceleration,gyroscopeand
geomagneticsensors
► Ambient luminosity
► Luxes
► Luminositysensor.
► Screen status
► Used devices
► LG Nexus 4 (100 hours)
► HTC Desire 816 (20 hours)
11/35
UCAmI
2015
Data processing
► 3 steps
► 1. Data fusion
► 2. Data transformation
► 3. Feature extraction
1. DataFusion
2. Data
transformation
3. Featureextraction
12/35
UCAmI
2015
Data processing
► 1. Data fusion
► Timestamps
► Gathering halts
► Sample rate
► 50Hz,20Hz,10Hz,5Hz, 2Hzand1Hz
1. Data Fusion
2. Data
transformation
3. Featureextraction
RMS,dBs
Acceleration,gyroscope,
compass
Luminosity,screen
13/35
UCAmI
2015
Data processing
► 1. Data fusion
► 2. Data transformation
► Raw to processed characteristics
1. DataFusion
2. Data
transformation
3. Featureextraction
RMS,dBs
Acceleration,gyroscope,
compass
Luminosity,screen
LPF(RMS), LPF(dBs)
Lineal-acc.,earth-acc.
log(lum),fixedLum,
log(fixedLum)
+
14/35
UCAmI
2015
Data processing
► 1. Data fusion
► 2. Data transformation
► 3. Featureextraction
1. DataFusion
2. Data
transformation
3. Feature
extraction
RMS,dBs
Acceleration,gyroscope,
compass
Luminosity,screen
Max,min,mean,median,standard
deviation
LPF(RMS), LPF(dBs)
Lineal-acc.,earth-acc.
log(lum),fixedLum,
log(fixedLum)
+
15/35
UCAmI
2015
Introduction
System Design
Evaluation
Conclusion
16/35
UCAmI
2015
Evaluation
► Training Set
► 10x5-fold cross validation
► Nexus 4 70h
► Test Set
► Nexus 4 30h
► HTC Desire 20h
► Classifiers
► Random forest
► Support vector machine (SVM) -
Gaussian radial basis function kernel
► k-Nearest Neighbours (k-NN)
► Naive Bayes classifier
► Parameters
► The best features to use
► The most suitable window sizes
► Classifier comparison
► Sensor sampling rate comparison
► Performance
► Recall
► Specificity
► AUC
► Accuracy
What is thebest combination of
parametersto detect bar-like
environments?
17/35
UCAmI
2015
Evaluation
► Feature comparison
► Acceleration features comparison
► Vectornorm->Randomforest,SVMandk-NNleadstobetterresults
► Typesofacceleration
– Linear=“Earth-acceleration”
– Baseacc.betterthanLinear&“Earth-acceleration”(RandomforestandSVM,4%)
► Audio features comparison
► dBbetter thanRMS:
– SVM(4% - 9%), k-NN(6%-15%),Naive Bayes(2% -8%)
► Filteredbetter thanUnfiltered(k-NNis theonly exception)
► Luminosity features comparison
► Combinationoflog transformationandthe fixedversion is thebestchoice
– RandomForest (1%), SVM(3%), k-NN(-),NaiveBayes(11%)
18/35
UCAmI
2015
Evaluation
► Contribution of eachsensor
► Training with the best performing feature of each sensor
► concludedin theprevious comparisons
► Results
► Audioexclusion declines from15%to20%
► Accelerationexclusion declines from1%to10%
► Luminosityonlyuseful forSVMandNaiveBayes
19/35
UCAmI
2015
Evaluation
► Window sizecomparison
► Common pattern: The smaller the window size, the worse the results
► Random Forest
► 240seconds
► 120or90 -> 2%performancelost
► SVM
► 120seconds
► 60seconds-> 2%performancelost
► k-NN classifier
► 180seconds
► 60seconds-> lessthan2%performancelost
► Naive Bayes
► 240seconds
► 120seconds -> 2% performancelost
20/35
UCAmI
2015
Evaluation
► Sample ratecomparison
► Smaller window sizes suffer more than biggerones when this parameter is
decreased
21/35
UCAmI
2015
Evaluation
► Classifier comparison
► Thebest is SVM
► + recall
► + AUC
► + accuracy
► Random Forest
► +specificity
22/35
UCAmI
2015
Evaluation
► The best performing
configuration
► SVM
► Features
► Linearacceleration
► FiltereddBs
► Log-transformed
fixedluminosity
► Capable of generalizing
to new environments
► User anddevice
dependencies
Bar-like TP
FN
Other FP
TN
Bar-like TP
FN
Other FP
TN
► Results
23/35
UCAmI
2015
Introduction
System Design
Evaluation
Conclusion
24/35
UCAmI
2015
Conclusion
► Findings
► The preliminary results obtained seem promising regarding the recognition of new
locations for the same user.
► However, generalization to new users seems to be more troublesome.
► Future work
► New data collection campaign which involves more users in order to better study
these aspects
► Study what is the most descriptive value for eachfeature (mean, median,
standard deviation, minimum and maximum)
► Searchfor better recognition results with separate classes for each type of bar-like
environment, as this could potentially enable to better capture the particular
characteristics each of these environments has.
25/35
UCAmI
2015
Thank you for your
attention
26/35
UCAmI
2015
DeustoTech-Deusto Institute of Technology, University of Deusto
http://www.morelab.deusto.es
Facing up social activity recognition using smartphone sensors
Pablo Curiel,IvanPretel, AnaB. Lago
{pcuriel@deusto.es} {ivan.pretel@deusto.es} {anabelen.lago@deusto.es}
27/35
UCAmI
2015
All rights of images are reservedby the original owners*,the rest of the content
is licensed under a Creative Commons by-sa 3.0 license.
*
• http://mami.uclm.es/ucami-iwaal-amihealth-2015
• https://flic.kr/p/enRrs9
• https://www.iconfinder.com/yudha_ap
• https://www.iconfinder.com/iconsets/stash
• https://www.iconfinder.com/DemSt
• https://www.iconfinder.com/paomedia
• https://flic.kr/p/eD7GR
• https://flic.kr/p/8G1yiU

Facing up social activity recognition using smartphone sensors

  • 1.
    1/35 UCAmI 2015 DeustoTech-Deusto Institute ofTechnology, University of Deusto http://www.morelab.deusto.es December 2, 2015 Facing up social activity recognition using smartphone sensors Pablo Curiel,Ivan Pretel, AnaB. Lago
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
    6/35 UCAmI 2015 Introduction ► Location-based services ►Foursquare, Twitter, Google Keep,… ► Low-level inference ► Physical activity: walking, running, cycling,… ► High-level inference ► High-level user activities: cooking, reading novel,… ► Environments or surroundings: home, bar, public transport . . .
  • 7.
    7/35 UCAmI 2015 Introduction ► Socialization asa high-level useractivity ► based on environmentrecognition ► provides “social reminders” . . . @ @ @
  • 8.
  • 9.
    9/35 UCAmI 2015 System Design: Contextcapture ► Environments ► Bar, café, sports bar, disco and restaurant ► Characteristics ► Noisy places ► Stationary positions ► Artificially lighted places
  • 10.
    10/35 UCAmI 2015 System Design: Contextcapture ► Captured Data ► Audio ► RMSpoweranddBs ► Microphone ► Acceleration ► 3-axialacceleration ► Acceleration,gyroscopeand geomagneticsensors ► Ambient luminosity ► Luxes ► Luminositysensor. ► Screen status ► Used devices ► LG Nexus 4 (100 hours) ► HTC Desire 816 (20 hours)
  • 11.
    11/35 UCAmI 2015 Data processing ► 3steps ► 1. Data fusion ► 2. Data transformation ► 3. Feature extraction 1. DataFusion 2. Data transformation 3. Featureextraction
  • 12.
    12/35 UCAmI 2015 Data processing ► 1.Data fusion ► Timestamps ► Gathering halts ► Sample rate ► 50Hz,20Hz,10Hz,5Hz, 2Hzand1Hz 1. Data Fusion 2. Data transformation 3. Featureextraction RMS,dBs Acceleration,gyroscope, compass Luminosity,screen
  • 13.
    13/35 UCAmI 2015 Data processing ► 1.Data fusion ► 2. Data transformation ► Raw to processed characteristics 1. DataFusion 2. Data transformation 3. Featureextraction RMS,dBs Acceleration,gyroscope, compass Luminosity,screen LPF(RMS), LPF(dBs) Lineal-acc.,earth-acc. log(lum),fixedLum, log(fixedLum) +
  • 14.
    14/35 UCAmI 2015 Data processing ► 1.Data fusion ► 2. Data transformation ► 3. Featureextraction 1. DataFusion 2. Data transformation 3. Feature extraction RMS,dBs Acceleration,gyroscope, compass Luminosity,screen Max,min,mean,median,standard deviation LPF(RMS), LPF(dBs) Lineal-acc.,earth-acc. log(lum),fixedLum, log(fixedLum) +
  • 15.
  • 16.
    16/35 UCAmI 2015 Evaluation ► Training Set ►10x5-fold cross validation ► Nexus 4 70h ► Test Set ► Nexus 4 30h ► HTC Desire 20h ► Classifiers ► Random forest ► Support vector machine (SVM) - Gaussian radial basis function kernel ► k-Nearest Neighbours (k-NN) ► Naive Bayes classifier ► Parameters ► The best features to use ► The most suitable window sizes ► Classifier comparison ► Sensor sampling rate comparison ► Performance ► Recall ► Specificity ► AUC ► Accuracy What is thebest combination of parametersto detect bar-like environments?
  • 17.
    17/35 UCAmI 2015 Evaluation ► Feature comparison ►Acceleration features comparison ► Vectornorm->Randomforest,SVMandk-NNleadstobetterresults ► Typesofacceleration – Linear=“Earth-acceleration” – Baseacc.betterthanLinear&“Earth-acceleration”(RandomforestandSVM,4%) ► Audio features comparison ► dBbetter thanRMS: – SVM(4% - 9%), k-NN(6%-15%),Naive Bayes(2% -8%) ► Filteredbetter thanUnfiltered(k-NNis theonly exception) ► Luminosity features comparison ► Combinationoflog transformationandthe fixedversion is thebestchoice – RandomForest (1%), SVM(3%), k-NN(-),NaiveBayes(11%)
  • 18.
    18/35 UCAmI 2015 Evaluation ► Contribution ofeachsensor ► Training with the best performing feature of each sensor ► concludedin theprevious comparisons ► Results ► Audioexclusion declines from15%to20% ► Accelerationexclusion declines from1%to10% ► Luminosityonlyuseful forSVMandNaiveBayes
  • 19.
    19/35 UCAmI 2015 Evaluation ► Window sizecomparison ►Common pattern: The smaller the window size, the worse the results ► Random Forest ► 240seconds ► 120or90 -> 2%performancelost ► SVM ► 120seconds ► 60seconds-> 2%performancelost ► k-NN classifier ► 180seconds ► 60seconds-> lessthan2%performancelost ► Naive Bayes ► 240seconds ► 120seconds -> 2% performancelost
  • 20.
    20/35 UCAmI 2015 Evaluation ► Sample ratecomparison ►Smaller window sizes suffer more than biggerones when this parameter is decreased
  • 21.
    21/35 UCAmI 2015 Evaluation ► Classifier comparison ►Thebest is SVM ► + recall ► + AUC ► + accuracy ► Random Forest ► +specificity
  • 22.
    22/35 UCAmI 2015 Evaluation ► The bestperforming configuration ► SVM ► Features ► Linearacceleration ► FiltereddBs ► Log-transformed fixedluminosity ► Capable of generalizing to new environments ► User anddevice dependencies Bar-like TP FN Other FP TN Bar-like TP FN Other FP TN ► Results
  • 23.
  • 24.
    24/35 UCAmI 2015 Conclusion ► Findings ► Thepreliminary results obtained seem promising regarding the recognition of new locations for the same user. ► However, generalization to new users seems to be more troublesome. ► Future work ► New data collection campaign which involves more users in order to better study these aspects ► Study what is the most descriptive value for eachfeature (mean, median, standard deviation, minimum and maximum) ► Searchfor better recognition results with separate classes for each type of bar-like environment, as this could potentially enable to better capture the particular characteristics each of these environments has.
  • 25.
  • 26.
    26/35 UCAmI 2015 DeustoTech-Deusto Institute ofTechnology, University of Deusto http://www.morelab.deusto.es Facing up social activity recognition using smartphone sensors Pablo Curiel,IvanPretel, AnaB. Lago {pcuriel@deusto.es} {ivan.pretel@deusto.es} {anabelen.lago@deusto.es}
  • 27.
    27/35 UCAmI 2015 All rights ofimages are reservedby the original owners*,the rest of the content is licensed under a Creative Commons by-sa 3.0 license. * • http://mami.uclm.es/ucami-iwaal-amihealth-2015 • https://flic.kr/p/enRrs9 • https://www.iconfinder.com/yudha_ap • https://www.iconfinder.com/iconsets/stash • https://www.iconfinder.com/DemSt • https://www.iconfinder.com/paomedia • https://flic.kr/p/eD7GR • https://flic.kr/p/8G1yiU

Editor's Notes

  • #3 This presentation is formed by 4 main sections. First, I will introduce the motivation and the main aim of this work, which is the bar-like environments recognition. Secondly, I will explain the followed approach to achieve this aim. Next, the evaluation of our approach. And finally, the conclusions and the future work
  • #5 Several works address the problem of recognizing user context. They relied in ad-hoc architectures for either equipping spaces with a sensing infrastructure or attaching sensors to human body.
  • #6 However, current smartphones are equipped with a wide variety of sensors, like accelerometer, gyroscope, geomagnetic sensor, luminosity sensor, microphone or GPS among others. Consequently, they are an ideal replacement for those early ad-hoc sensing stations. As a result, the context recognition area has greatly benefited from mobile technologies. In addition, smartphones are part of our daily lives and we carry them with ourselves everywhere and every-time. For this reason, context information is especially important in the mobile computing area, where user context and their needs change rapidly
  • #7 In fact, in the last years context awareness has become a reality in real-world mobile applications. In particular, usage of simple context information like location is commonly used in commercial applications like Foursquare, to suggest interesting venues nearby, or Twitter to tell which topics are trending in each user's location. More recently, more complex context recognition is gaining presence in everyday products. For instance, several physical activity tracking application are very popular nowadays. They are capable of distinguishing a number of activities like walking, running, cycling or climbing stairs. However, many complex or high-level context information does not always follow a clear or recognizable pattern that can be interpreted using low-level sensor data. This is the case of many high-level user activities like cooking or reading a newspaper; or user environments like home, workplace, bar or public transport.
  • #8 In this paper we address the issue of environment recognition as a means to tackle a high-level user activity: socialization. Detecting when users are engaged in social contexts would enable services like "social reminders". Products like coupon or “discount applications” or marketing campaigns where the aim is to attract the current clients' friends, or to present promotions in the most appropriate moment. However, directly inferring a social interaction using smartphone sensors is not possible. For this reason, in the present work we address this issue by means of environment recognition: detecting when a user is in a bar, pub, restaurant or similar establishments. In many countries, like Spain, these kinds of establishments are one of the main places of socialization. Therefore we consider that recognizing these kinds of venues is useful.
  • #10 For the task of deciding which data to use for recognizing bar-like establishments we considered the main characteristics that all of them share. In general, we can describe them as: - Noisy places with continuous murmur, music playing or TV on among other noises. - People either sitting or standing, depending on the kind of establishment, but usually in stationary positions. - Low-light locations, specially low in the case of pubs, for instance, and less dark in others like restaurants. But in general terms, they can be described as artificially lighted places. Consequently, we will use audio, acceleration and luminosity as data sources for the recognition task.
  • #11 In order to train and test classification algorithms, we first need an annotated dataset of a diverse list of bar-like establishments and other non-bar environments. For this purpose, we developed an Android application used by two users which gathers… Audio root mean square power (RMS) and Decibels (dBs). From the microphone. 3-axial acceleration. From acceleration, gyroscope and geomagnetic sensors. Luminosity from the luminosity sensor. And finally, screen status. Provided by the Android framework. That it is used for transforming the luminosity data.
  • #12 Once we capture the raw data using the Android application we must process it before feeding the classification algorithms with the generated dataset. First we carry out a data fusion process where data coming from the different sensors is combined at a constant and uniform sampling rate. Second, we make some transformations to improve the properties of the data for the classification task. Finally, we extract the final features that will be used in the classifiers.
  • #13 Not all sensors are capable of providing samples at the same rate and offering synchronized times for all of them. The devices used for data capturing were used normally while this data gathering was running, some sensors suffered from occasional increases, delays or halts in their sampling rates. Following the 50 Hz sample rate requested by default, we fuse data at this constant rate and also at 20 10 5 2 and 1. Linear interpolation is used to compute the missing values.
  • #14 In this step we apply transformations to the raw data variables in order to generate new ones. Acceleration is processed to generate both linear (with no gravity component) and earth-coordinate versions of it. In addition, this data are augmented with the acceleration vector norm. Regarding audio data, it is too noisy in its raw form. Thus, we also generated filtered versions of sound variables using a low pass filter. Concerning luminosity data, it exposes a remarkable issue. Due to normal operation of the mobile phone while data was captured, it was placed inside pockets for long periods, resulting in repeated zero values which do not correspond to the true luminosity of the environment. To tackle this problem, we process luminosity data and fill those zero values when the screen is turned on with the closest sample observed when the screen was turned on. Additionally, luminosity data follows a heavily skewed, long-tail distribution which can be tricky for some classifiers. Thus we also generated a log-transformed version of this variable which presents a more normalized shape.
  • #15 The final step before feeding the data into a classifier is feature extraction. First we grouped data into the labeled environments and after that we split data into window frames to compute these features. For each window frame, we compute the mean, median, standard deviation, minimum and maximum values of all the variables. These aggregated measures make up our features for the classification task.
  • #17 With this evaluation we study what is the best combination of parameters to detect bar-like environments We split the 100 hours captured with the Nexus 4 into 70 hours for training and 30 for testing. The 20 hours of data captured with the HTC were dedicated to the test set, in order to evaluate how well the classifiers are able to generalize to different users and devices. Regarding classifier training, we used a 10-repetition 5-fold cross validation. Additionally, we trained four different classifiers for a more exhaustive comparison: a random forest, a support vector machine (SVM), k-Nearest Neighbours (k-NN) and a Naive Bayes classifier. There are several configuration parameters to study: the best features to use the most suitable window sizes and the classifier performance decay with decreasing sensor sampling rates. We study this performance studying the recall, specificity, area under the ROC curve (AUC) and accuracy For all these evaluation tasks we used R in its 3.1 version. For training and testing classifiers we used the caret package (version 6.0-41), and more specifically, the random Forest package (version 4.6-7) for using random forests, kernlab [4] (version 0.9-20) for the SVM, the R built-in class package for k-NN and klaR [15] package (version 0.6-12) for the Naive Bayes classifier.
  • #18 Focusing on the feature comparison With acceleration features we compared two aspects: If adding the vector norm was useful and if either linear or earth-acceleration lead to better results than base acceleration. -In general, vector norm can be a useful feature to add to the 3-axis acceleration. For random forest, SVM and k-NN, it leads to better classification results for the three types of acceleration but this improvements is in general subtle (around 1%). -Regarding the comparison of the three types of acceleration, there are more different results. For all classifiers, there is no significant difference between linear and earth acceleration. Comparing base acceleration with the transformed ones, both random forest and SVM show better results (up to a 4% better). With audio features we also compared two aspects: -Comparing the two feature types, dBs are significantly better than RMS, except for the random forest, which shows no difference between both. This improvement ranges from a 4 to a 9% in the case of the SVM, from 6 to 15% for k-NN and from 2 to 8% for Naive Bayes. -In the case of the filtering, there exist less differences. Except for k-NN, which in the case of RMS it works better with the unfiltered version, the filtered feature leads to better results -With luminosity features we studied if the two applied transformations are useful. -Although using separately the log transformation and the fixed version we don’t get better results, combining both transformations we get a substantial improvement. The bigger in Naive Bayes.
  • #19 The last step in feature comparison is studying the contribution of the features extracted from each sensor to the classifier performance. This was done training the classifiers with the best performing feature of each sensor. Then, classifiers where trained excluding the features captured by each sensor. The results are the following. Audio features are the most important feature. Decline in performance without it for the four classifiers ranges from a significant 15% to a 20%. Acceleration features are less important, but nevertheless their contribution ranges from a significant 1% to a 10%. In contrast, luminosity features are only useful for SVM and Naive Bayes.
  • #20 Once selected the best performing features we used them to make window comparisons. Although the best performing one varies for each classifier, a common pattern can be seen. As it was expected, the smaller the window size, the worse the results. Considering each classifier independently, random forest shows the best performance for 240 second windows. The average performance loss for smaller windows is around a 2%. In the case of the SVM, the best performing window is of 120 seconds. The k-NN classifier the best is a 180 second windows. Finally, Naive Bayes stands out with 240 second windows.
  • #21 Later, we studied how decreasing the sampling rate of the sensors impacts classifier performance. As expected, smaller window sizes suffer more than bigger ones when this parameter is decreased.
  • #22 Lastly and having selected both the best performing features and the best window sizes, we can compare how well each classifier performs the recognition task. As it can be observed, the best performing classifier is SVM, which outperforms the others in recall, AUC and accuracy. Only random forest beats SVM in specificity.
  • #23 After studying the best performing configuration, we selected the SVM with linear acceleration, filtered dBs and log-transformed fixed luminosity as the best classifier With the first test set, nexus 4, we have satisfactory results. However, the HTC testset results are much less satisfactory The reason for this results is that, as it can be observed, only "Bar" is successfully classified. Seeing this, we tried training the classifier with this second test data to evaluate its performance. In this case, results were much more satisfactory. It means that the proposed system is at least capable of generalizing to new environments captured by the same user and device.
  • #25 With this study we’ve concluded that…