Data Analysis. Predictive Analysis. Activity Prediction that a subject performs based in measurements obtained from the accelerometer and gyroscope of the Smartphones

IT
[1]@gsantosgo
Information Tecnology
Data Analysis
Title: Activity Prediction that a subject performs based in measurements
obtained from the accelerometer and gyroscope of the Smartphones
Introduction:
Recently, our lives are invaded by small mobile devices, known as smartphones. These devices are mobile mini-
computers, they have an operating system that allows it to launch applications, include a set of applications to
manage contacts andaddress book, to create, editorview differenttypes of documents, to access orbrowse the
Web, too provide us telephony or messaging services, etc. Apart from these previous features, the most of the
smartphones have currently begun to incorporate other features such as cameras, GPS and various types of
sensors.
In this analysis, we used data obtained from the accelerometer [1] and gyroscope[2] sensor signals of the
smartphones. The accelerometer and gyroscope sensors measure 3-axial linear acceleration and3-axial angular
velocity, with these two sensors can monitor device acceleration, positions, orientation, rotation and angular
motion. All these data can be stored and used to recognize a user’s activity. Here we refer to physical activities
thatahumanpersoncanperformdailysuchaswalking, walking up, jogging, sitting, laying, etc.
The aim of this analysis consisted of perform a classification’s task. We took a dataset with their attributes
(acceleration, orientation,…) and its labeled variable (in this case is activity), and later we created various
classification’s models also known classifiers. To create these classification’s models we can use various
algorithms of classification. These algorithms use all available information of a dataset to help us to classify or
predictthatactivityisperformedbyahumanperson.
To create models of classification (models of classification), we performed a first task that consisted of choose
different algorithms or techniques of classification, then for each algorithm or technique of classification we
applied what is called cross-validation [3], that is, we trained these algorithm with a set of training data that
corresponds to several observations of our available dataset. The following task was tested our classification’s
algorithm to observe the accuracy, that is, if our predictive model can classify correctly a human’s activity
according to the acquiredknowledge in the stage of training. This whole process is known as supervisedlearning
[4].

IT
[2]@gsantosgo
Methods:
DataCollection
For this analysis we used a dataset on the Human Activity Recognition. This dataset were downloaded from
coursera.org [5]in Data Analysis Course on March 03, 2013 using the R programming language. The data of this
dataset are previously processedto make them easierto loadinto R, since the data was obtainedfromother raw
data from the UC Irvine Machine Learning Repository [6] that has a dataset available about Human Activity
Recognition[7], builtfromthe recordingsof 30 subjectsperforming activitiesofdaily living (ADL)while carrying a
waist-mountedSmartphonewithembeddedinertial sensors.
The dataset for this analysis contains 7352 observations and 563 variables. For each observation, there is a
categorical orfactorvariable called“activity”(ourlabeledvariable orclass)thatindicatestheactivity carriedout
by a human person, there are only six possible values for this variable: laying, sitting, standing, walk,
walkdown and walkup. Too, there isanotherintegervariable knownas“subject”thatisthe identificatorof the
person that performed that activity. Andfinally, the rest of the 561 variables are numeric variables (quantitative)
that contains features about time and frequency on triaxial acceleration (mean, standard deviation, energy,
correlation, etc.)fromtheaccelerometer, triaxial angularvelocityfromthegyroscope, etc.
For more information about all these variables, you can find the features here in this compressed file [8]. This
compressedfile contains some interesting descriptive files thatshow information aboutthe variables usedin this
dataset, all featuresandlabeledvariableorclass.
ExploratoryAnalysis
Exploratory analysis was performed by examining data and plots of the observed data. Exploratory analysis was
used to (1) identify missing values, (2) verify the quality of the data, (3) check name of variables that are
syntactically correct and (4) identified possible different patterns between the different activities and so to be
abletodistinguishwhenauserperformsanactivity oranother.
Our predictive model [9]shouldbe able to recognize patterns corresponding to every activity. Figure 1 shows the
different patterns for different activities according to the analysis of acceleration X-axis. We can observe that
therearedifferentpatternsaccording tothatactivity iscarriedoutby auser.

IT
[3]@gsantosgo
Figure2 showsthedifferentpatternsfordifferentactivitiesaccording totheanalysisofaccelerationY-axis.
Figure3 showsthedifferentpatternsfordifferentactivitiesaccording totheanalysisofaccelerationZ-axis.

IT
[4]@gsantosgo
It’s important keepin mind, if there are activities with common patterns, ourpredictive model will obviously have
more difficultto classify these activities correctly andtherefore ourmodel will have loweraccuracy, thatis, ithas
moredifficultiestodistinguishamong activities.
Statistical Modeling
To be able to classify the activity that is performed by a subject, we used various techniques or algorithms of
classification to recognize and predict our labeled variable (activity). The techniques (classifiers) employed for
thisdataanalysisarethefollowing:
DecisionTrees[10]
CART[11]
Bagging [12]
RamdomForest [13]
SVM[14]
We performed cross-validation for each of these previous techniques (classifiers). We also evaluated the
performance, theaccuracyandtheerrorrateoftheseclassifiers.
Reproducibility
All analysesperformedinthismanuscriptarereproducedintheR markdownfilesamsungPredictive.Rmd[15].
Note. Due to security concerns with the exchange of R code, we don’tsubmit code to reproduce analysis, in this
dataanalysis.
Results:
As I said, the dataset for this analysis contains a total size 7352 observations with 563 variables, these
observations correspond to a total 21 people. In Table 1, shows the number of examples per subject and type of
activity, andalsothepercentageoftotal peractivity fromourdataset.
We foundvariables that have syntactically incorrect names, thatis, the name of variables use incorrect character
such as comma(“,”), brackets (“(“),etc. , then itwasnecessary to have validvariable names andnotduplicatedin
our dataset (or data frame). We observed to detect missing values in the dataset, and there weren’t missing
values.
Ourclass orlabeledvariable was transformedfromcharactervariable to a factorvariable with 6 levels: “laying”,
“sitting”, “standing”, “walk”, “walkdown”and“walkup”.

IT
[5]@gsantosgo
According to assignment, for this data analysis we used a training set that include the data from subjects 1, 3, 5
and 6 and a test set that include the data from 27, 28, 29 and 30. Table 2 shows the number of samples per
activity that we used to perform the stage of training. And Table 3 indicates the number of samples per activity
thatweusedtoperformthestageoftesting.
id laying sitting standing walk walkdown walkup Total
1 50 47 53 95 49 53 347
3 62 52 61 58 49 59 341
5 52 44 56 56 47 47 302
6 57 55 57 57 48 51 325
7 52 48 53 57 47 51 308
8 54 46 54 48 38 41 281
11 57 53 47 59 46 54 316
14 51 54 60 59 45 54 323
15 72 59 53 54 42 48 328
16 70 69 78 51 47 51 366
17 71 64 78 61 46 48 368
19 83 73 73 52 39 40 360
21 90 85 89 52 45 47 408
22 72 62 63 46 36 42 321
23 72 68 68 59 54 51 372
25 73 65 74 74 58 65 409
26 76 78 74 59 50 55 392
27 74 70 80 57 44 51 376
28 80 72 79 54 46 51 382
29 69 60 65 53 48 49 344
30 70 62 59 65 62 65 383
Sum 1407 1286 1374 1226 986 1073 7352
% 19,14 17,49 18,69 16,68 13,41 14,59 100
Table 1.Number of samples per subject and type of activity
Laying sitting standing walk walkdown walkup
55 50 57 64 49 53
Table 2.Number of samples per activity for Training
laying sitting standing walk walkdown walkup
74 64 71 56 52 54
Table 3.Number of samples data per activity fo Testing

IT
[6]@gsantosgo
Weperformedtheprocessofcross-validationforeachofthepreviousclassifiersusing thetraining setandtest
setwerealreadyearlyspecified.
Theresultsobtainedfordifferentclassificationtechniques(predictivemodels)using theR programming language
arepresentedinTable4. Inthistablecanbetheaccuracy ofeachclassificationtechniqueperactivity. Thecells
inboldandunderlineindicatethebestaccuracy.
Itisimportanttakeintoaccountthatweusedall quantitativevariables(561variables)topredicttheactivity
carriedoutbyasubjectinthese5classificationtechniques. Recall, ifwehavealotofvariables, theperformance
oftheclassificationalgorithmmaybeextremely affected, tooalotofthesequantitativevariablescouldaddnoise
toclassifycorrectlyactivities, andotherscouldnotbeinteresting toprovidegoodinformationtodistinguish
among activities. Ontheotherhand, Itwill bevery interesting, toperformameasureofhowmuchtheclassifiers
areoverfitting[16].
Ingeneral, themostoftheclassificationtechniquesusedinthisanalysishavehighlevelsofaccuracy. Butwecan
observelessaccurateforsomeactivitiesandforsomeclassificationtechniques.
% Correctly Predicted
Model Tree
library(tree)
CART
library(rpart)
BAGGING
library(ipred)
Random Forest
library(randomForest)
SVM
library(e1071)
laying 100,00 100,00 100,00 100,00 100,00
sitting 70,31 67,19 67,19 82,81 82,81
standing 85,92 88,73 88,73 88,73 88,73
walk 50,00 57,14 80,30 92,86 92,86
walkdown 84,61 86,54 94,23 86,54 86,54
walkup 85,19 85,19 87,03 96,30 98,15
All 79,34 80,80 86,25 91,21 91,52
Table 4.Accuracies of the Classification Techniques
In the following tables (Table 5-9) show confusion matrices for each of classification
techniques.
Predicted Class
Actual Class laying sitting standing walk walkdown walkup
laying 74 0 0 0 0 0
sitting 0 45 19 0 0 0
standing 0 10 61 0 0 0
walk 0 0 0 28 6 22
walkdown 0 0 0 0 44 8
walkup 0 0 0 1 7 46
Table 5.Confusion matrix for the Decision Tree

IT
[7]@gsantosgo
Predicted Class
laying 74 0 0 0 0 0
sitting 0 43 21 0 0 0
standing 0 8 63 0 0 0
walk 0 0 0 32 4 20
walkdown 0 0 0 0 45 7
walkup 0 0 0 1 7 46
Table 6.Confusion matrix for the CART
Predicted Class
laying 74 0 0 0 0 0
sitting 0 43 21 0 0 0
standing 0 8 63 0 0 0
walk 0 0 0 53 0 3
walkdown 0 0 0 0 49 3
walkup 0 0 0 1 6 47
Table 7.Confusion matrix for Bagging
Predicted Class
laying 74 0 0 0 0 0
sitting 0 53 11 0 0 0
standing 0 8 63 0 0 0
walk 0 0 0 52 0 4
walkdown 0 0 0 0 47 5
walkup 0 0 0 0 2 52
Table 8.Confusion matrix for Random Forest
Predicted Class
laying 74 0 0 0 0 0
sitting 0 53 11 0 0 0
standing 0 8 63 0 0 0
walk 0 0 0 52 0 4
walkdown 0 0 0 0 47 5
walkup 0 0 0 0 1 53
Table 9.Confusion matrix for SVM
In general, we observedthatthe classification techniques identify correctly laying (100%). Itappears much more
difficulttodistinguishbetweensitting andstanding, andalsotodistinguishbetweenwalk, walkdownandwalkup.

IT
[8]@gsantosgo
The Bagging, Random Forest and SVM are classifiers that require more computing and memory resources, and
thereforemoreclassificationtimethanTreeandCART.
Conclusions:
In this analysis, we employed various classification techniques to obtain different predictive model. The SVM
classifier algorithm achieved the highest levels of accuracy for this analysis (91,52%accuracy). It will be
recommendable to increase the number of observations. Too, it will be recommendable to increase the samples
for the set of training data, and the samples for the set of test data, and observe if the accuracy increased or
decreased. On the other hand, there are some problems to detect patterns of some activities with each other,
because there are a lot of similar patterns among the different activity and then the classifier doesn’t classify
correctly.
References
[1]Accelerometer
http://en.wikipedia.org/wiki/Accelerometer. Accessed03/04/2013
[2]Gyroscope
http://en.wikipedia.org/wiki/Gyroscope. Accessed03/04/2013
[3]CrossValidation
http://en.wikipedia.org/wiki/Cross-validation_%28statistics%29. Accessed03/10/2013
[4]SupervisedLearning
http://en.wikipedia.org/wiki/Supervised_learning. Accesed03/05/2013
[5]DatasetofHumanActivityRecognitionCoursera
https://spark-public.s3.amazonaws.com/dataanalysis/samsungData.rda. Accessed03/03/2013
[6]UC IrvineMachineLearning Repository
http://archive.ics.uci.edu/ml/. Accessed03/06/2013
[7]DatasetofHumanActivityRecognitionUsing SmartphonesDataSet
http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones.
Accessed03/06/2013
[8]FileofHumanActivityRecognitionUCI
http://archive.ics.uci.edu/ml/machine-learning-databases/00240/UCI%20HAR%20Dataset.zip. Accessed
03/06/2013
[9]PredictiveModelling
http://en.wikipedia.org/wiki/Predictive_modelling. Accessed03/10/2013
[10]TreeLearning
http://en.wikipedia.org/wiki/Decision_tree_learning. Accessed03/10/2013
[11]CART

IT
[9]@gsantosgo
http://en.wikipedia.org/wiki/Predictive_analytics#Classification_and_regression_trees. Accessed03/10/2013
[12]Bagging
http://en.wikipedia.org/wiki/Bootstrap_aggregating. Accessed03/10/2013
[13]RandomForest(RF)
http://en.wikipedia.org/wiki/Random_forest. Accessed03/10/2013
[14]SupportVectorMachine(SVM)
http://en.wikipedia.org/wiki/Support_vector_machine. Accessed03/10/2013
[15]R MarkdownPage.
http://www.rstudio.com/ide/docs/authoring/using_markdown. Accessed03/06/2013
[16]Overfitting
http://en.wikipedia.org/wiki/Overfitting. Accessed03/10/2013

Data Analysis. Predictive Analysis. Activity Prediction that a subject performs based in measurements obtained from the accelerometer and gyroscope of the Smartphones

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data Analysis. Predictive Analysis. Activity Prediction that a subject performs based in measurements obtained from the accelerometer and gyroscope of the Smartphones

Similar to Data Analysis. Predictive Analysis. Activity Prediction that a subject performs based in measurements obtained from the accelerometer and gyroscope of the Smartphones (20)

More from Guillermo Santos

More from Guillermo Santos (7)

Recently uploaded

Recently uploaded (20)

Data Analysis. Predictive Analysis. Activity Prediction that a subject performs based in measurements obtained from the accelerometer and gyroscope of the Smartphones