Activity Recognition dataset (UCI dataset) consists of various activities performed by the user (Bending, Walking, Lying etc.). Various data mining methods and classifiers (NaiveBayes, Random Forest, Random Tree, etc.) can be used to find out the predictive capability of the classifiers in classifying each activity.
1. INSY 5339 Principles of Business Data Mining Dr.Sikora
1
PREDICTIVE ANALYSIS ON
ACTIVITY RECOGNITION
SYSTEM
PROJECT REPORT - INSY 5339 – PRINCIPLES OF BUSINESS
DATA MINING
2. INSY 5339 Principles of Business Data Mining Dr.Sikora
2
TABLE OF CONTENTS
1. DATASET INTRODUCTION………………………………………………..…………………..………..3
1.1 DATAMINING INTRODUCTION………………………………………..………………………….3
1.2 OBJECTIVE…………………………………………………..................................................3
1.3 DATA BACKGROUND…………………………………..................................................3
1.4 DATASET INFORMATION................................................................................4
2. DATA PREPARATION………………………………………………………………………………………5
2.1 DATA CLEANING ………………………………………………….........................................5
3. ALGORITHMS USED ……………………………………….................................................6
3.1 ACCURACY ON FULL TRAINING SET.................................................................7
3.2 ACCURACY ON CROSS FOLDS.................................................................……….8
3.3 ACCURACY ON PERCENTAGE SPLIT..................................................................9
4. EXPERIMENTAL DESIGN..................................................................................10
4.1 RESULTS FOR EACH CLASSIFIER………………………………………………………….………11
4.2 RELATIVE ACCURACY OF EXPERIMENTAL DESIGN.........................................20
5. ROC CURVES...................................................................................................21
5.1 ROC CURVE – KNOWLEDGE FLOW……………………………………………………………..21
5.2 SINGLE CLASS VS 3 CLASSIFIERS………………………………………………………………….22
5.3 ALL CLASSES VS ALL CLASSIFIERS………………………………………………………………..25
6. PRINCIPAL COMPONENT ANALYSIS.................................................................26
7. CONCLUSION…………………………………………………………………………………………………27
8. REFERENCES………………………………………………………………………………………………….28
3. INSY 5339 Principles of Business Data Mining Dr.Sikora
3
1. DATA SET INFORMATION:
1.1 DATA MINING INTRODUCTION:
Data Mining means nontrivial extraction of implicit, previously unknown, and
potentially useful information from data. It is an interdisciplinary subfield of computer
science. The overall goal of the data mining process is to extract information from a data set
and transform it into an understandable structure for further use. Data mining can also be
defined as the semi-automatic or automatic analysis of large quantities of data to extract
previously unknown, interesting patterns such as groups of data records (cluster analysis),
unusual records (anomaly detection), and dependencies (association rule mining, sequential
pattern mining). This usually involves using database techniques such as spatial indices.
These patterns can then be seen as a kind of summary of the input data, and may be used in
further analysis or, for example, in machine learning and predictive analytics.
1.2 OBJECTIVE:
The main objective of this project is to determine the type of activity performed by the
user(Bending , walking, sitting, standing, Lying , cycling).This is determined by the use of
Activity Recognition system based on Multisensor data fusion (AReM) sensors.Our other
objective is to build and train the model by using training and test data sets.The output
obtained in the previous steps will be used to determine the credibility of the usage of the
AReM sensors for further experiments with regards to Activity Sensing.
1.3. DATA BACKGROUND:
Activity Recognition (AR) is an emerging research topic, which is founded on
established research fields such as ubiquitous computing, context-aware computing and
multimedia, and machine learning for pattern recognition. Recognizing everyday life
activities is a challenging application in pervasive computing, with a lot of interesting
developments in the health care domain, the human behavior modeling domain and the
human-machine interaction domain . Inferring the activity of the users in their own
domestic environments becomes even more useful in the Ambient Assisted Living (AAL)
scenario, where facilities provide assistance and care for the elderlies and the knowledge of
their daily activities can ensure safety and a successful aging.
From the point of view of the deployment of activity recognition solutions, we recognize
three main approaches. The first kind of solutions generally use sensors (embedding
accelerometers, or transducers for physiological measures) that make direct measures
about the user movements. The disadvantage of this approach is that wearable devices can
be intrusive on the user, even if, with recent advances in technologies of embedded
systems, sensors tend to be smaller and smaller. Solutions that avoid the use of wearable
4. INSY 5339 Principles of Business Data Mining Dr.Sikora
4
devices instead, are motivated by the need for a less intrusive activity recognition systems.
Among these solutions, those based on cameras are probably the most common. These are
the second type of sensors. More recently, a new generation of non wearable solution is
emerging. These solution exploits the implicit alteration of the wireless channel due to the
movements of the user, which is measured by devices placed in the environment and that
measure the Received Signal Strength (RSS) of the beacon packets they exchange among
themselves.
1.4. DATA SET INFORMATION:
This dataset contains temporal data from a Wireless Sensor Network worn by an actor
performing the activities: bending, cycling, lying down, sitting, standing, walking. The
classification tasks consist in predicting the activity performed by the user from time-series
generated by a Wireless Sensor Network (WSN). In our activity recognition system we use
information coming the implicit alteration of the wireless channel due to the movements of
the user. The devices measure the RSS of the beacon packets they exchange among
themselves in the WSN. They are placed on the user’s chest and ankles. For the purpose of
communications, the beacon packets are exchanged by using a simple virtual token protocol
that completes its execution in a time slot of 50 milliseconds.
From the raw data we extract time-domain features to compress the time series and
slightly remove noise and correlations. We choose an epoch time of 250 milliseconds. In
such a time slot we elaborate 5 samples of RSS (sampled at 20 Hz) for each of the three
couples of WSN nodes (i.e. Chest-Right Ankle, Chest-Left Ankle, Right Ankle-Left Ankle). The
features include the mean value and standard deviation for each reciprocal RSS reading
from worn WSN sensors. For each activity 15 temporal sequences of input RSS data are
present. The dataset contains 480 sequences, for a total number of 42240 instances. The
positions of sensor nodes with the related identifiers are shown in figure.
5. INSY 5339 Principles of Business Data Mining Dr.Sikora
5
2. DATA PREPARATION:
2.1 DATA CLEANING:
Data cleansing, data cleaning or data scrubbing is the process of detecting and correcting (or
removing) corrupt or inaccurate records from a data set, table, or database. Used mainly in
databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant, etc.
parts of the data and then replacing, modifying, or deleting this dirty or coarse data.
1. Merging all the data sets:
The data set was split into several files depending on the activity they had
performed. Without changing the attributes, we have merged all the files into one
excel dataset and added one extra attribute (class attribute) named Activity to
categorize different actions performed.
2. Macros:
Definition: An Excel macro is a set of programming instructions stored in what is
known as VBA code that can be used to eliminate the need to repeat the steps of
commonly performed tasks repeatedly. These repetitive tasks might involve
complex calculations that require the use of formulas or they might be simple
formatting tasks - such as adding number formatting to new data or applying cell
and worksheet formats such as borders and shading. The Macros have been used
6. INSY 5339 Principles of Business Data Mining Dr.Sikora
6
to find the missing values and amend them.
The Final Class Attribute values are categorized as Bending1, Bending2, Cycling,
Lying, Sitting, Standing and Walking.
3. ALGORITHMS USED FOR THIS EXPERIMENT:
After trying out various algorithms, the following algorithms have yielded the best results for
our experiment.
J48
(J48) is an algorithm used to generate a decision tree developed by Ross Quinlan
mentioned earlier. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The
decision trees generated by C4.5 can be used for classification, and for this
reason, C4.5 is often referred to as a statistical classifier.
Naïve Bayes
A naive Bayes classifier is an algorithm that uses Bayes' theorem to classify
objects. Naive Bayes classifiers assume strong, or naive, independence between
attributes of data points. These classifiers are widely used for machine learning
because they are simple to implement.
Decision Table
Decision Tree algorithm belongs to the family of supervised learning algorithms.
The decision tree algorithm tries to solve the problem, by
using tree representation. Each internal node of the tree corresponds to an
attribute, and each leaf node corresponds to a class label.
Random Tree
Random Tree is a supervised Classifier; it is an ensemble learning algorithm that
generates lots of individual learners. It employs a bagging idea to construct a
random set of data for constructing a decision tree.
7. INSY 5339 Principles of Business Data Mining Dr.Sikora
7
OneR
OneR, short for "One Rule", is a simple, yet accurate, classification algorithm that
generates one rule for each predictor in the data, then selects the rule with the
smallest total error as its "one rule". To create a rule for a predictor, we construct
a frequency table for each predictor against the target. It has been shown that
OneR produces rules only slightly less accurate than state-of-the-art classification
algorithms while producing rules that are simple for humans to interpret.
ZeroR
ZeroR is the simplest classification method which relies on the target and ignores
all predictors. ZeroR classifier simply predicts the majority category (class).
Although there is no predictability power in ZeroR, it is useful for determining a
baseline performance as a benchmark for other classification methods.
3.1 ACCURACY ON FULL TRAINING SET:
In this step we have used the full training set with all algorithms to determine which one is
the best one for our analysis.
Classifier
Correctly Classified Instances
(%)
Incorrectly Classified
Instances (%)
ZeroR 17.0459 82.9541
OneR 48.0383 51.9615
Naïve Bayes 64.3386 35.6614
J48 87.8974 12.1026
Decision Table 71.5121 28.4879
Random Tree 98.8612 1.1388
8. INSY 5339 Principles of Business Data Mining Dr.Sikora
8
3.2 ACCURACY ON CROSS FOLDS:
Here we have used the cross validation with 10 folds and have performed the test with all
algorithms. The result is given in the form of graph as well as table.
Classifier
Correctly Classified Instances
(%)
Incorrectly Classified
Instances (%)
Zero R 17.0459 82.9541
OneR 47.2881 52.7119
Naïve Bayes 64.3955 35.6045
J48 78.9578 21.0422
Decision Table 65.314 34.686
Random Tree 75.3687 24.6313
0
10
20
30
40
50
60
70
80
90
100
zeroR OneR Naïve
Bayes
J48 Decision
Table
Random
Tree
Correctly Classified
Instances(%)
Incorrectly Classified
Instances(%)
9. INSY 5339 Principles of Business Data Mining Dr.Sikora
9
3.3 ACCURACY ON PERCENTAGE SPLIT:
In this step, we have used percentage split of 66% to predict the accuracy of each algorithm.
The results are displayed in the form of graphs as well as tables.
Classifier
Correctly Classified Instances
(%)
Incorrectly Classified
Instances (%)
Zero R 17.0253 82.9747
OneR 47.3226 52.6774
Naïve Bayes 63.6864 36.3136
J48 77.5503 22.4497
Decision Table 65.2253 34.7747
Random Tree 72.9058 27.0942
0
10
20
30
40
50
60
70
80
90
Zero R OneR Naïve
Bayes
J48 Decision
Table
Random
Tree
Correctly Classified
Instances(%)
Incorrectly Classified
Instances(%)
10. INSY 5339 Principles of Business Data Mining Dr.Sikora
10
Based on the various tests, we have concluded the following three algorithms are good
enough to predict the CLASS attribute.
• J48
• Decision Table
• Random Tree
We confirmed the above results with the Receiver Operating Characteristic (ROC) graph
plotted against True Positive vs False Positive for the 6 algorithms and found out that these
3 algorithms have the better accuracy and also better Area under curve.
4. EXPERIMENTAL DESIGN:
A full factorial experiment is an experiment which consists of two or more factors each
of which has a discrete level. Each experimental unit in the experiment takes on all
possible combinations of these levels across all factors. Such an experiment allows the
investigator to study the effect of each factor on the response variable.
We selected the following classifiers for our experimental design:
• J48
• Decision Table
• Random Tree
0
10
20
30
40
50
60
70
80
90
Zero R OneR Naïve
Bayes
J48 Decision
Table
Random
Tree
Correctly Classified
Instances(%)
Incorrectly Classified
Instances(%)
11. INSY 5339 Principles of Business Data Mining Dr.Sikora
11
Four Cell Experimental Design:
It consists of 2 factors:
• With 10 % noise
• Without noise
It consists of 2 levels:
• %Split - 66 %
• %Split - 75 %
% Split - 66% % Split - 75%
Without Noise C1 C3
With Noise C2 C4
Four Cell Experimental Design:
• C1 – Percentage split 66% without noise
• C2 – Percentage split 66% with noise
• C3- Percentage split 75% without noise
• C4- Percentage split 75% with noise
Total Number of Experiments = Number of conditions*Number of Classifiers*10=
4*3*10 = 120 runs
4.1 RESULTS FOR EACH CLASSIFIER:
The table below describes the 12 possible combinations of our 4 criteria with the 3 selected
classifiers. We ran each of these combinations 10 times and averaged their accuracy and
variance:
12. INSY 5339 Principles of Business Data Mining Dr.Sikora
12
E1= Performance of J48 when, Attributes without noise + Percentage Split of 66%:34%
E2= Performance of J48 when, Attributes with noise + Percentage Split of 66%:34%
E3= Performance of J48 when, Attributes without noise + Percentage Split of 75%:25%
E4= Performance of J48 when, Attributes with noise + Percentage Split of 75%:25%
E1= Performance of Decision Table when, Attributes without noise + Percentage Split of
66%:34%
E2= Performance of Decision Table when, Attributes with noise + Percentage Split of
66%:34%
E3= Performance of Decision Table when, Attributes without noise + Percentage Split of
75%:25%
E4= Performance of Decision Table when, Attributes with noise + Percentage Split of
75%:25%
E1= Performance of Random Tree when, Attributes without noise + Percentage Split of
66%:34%
E2= Performance of Random Tree when, Attributes with noise + Percentage Split of
66%:34%
E3= Performance of Random Tree when, Attributes without noise + Percentage Split of
75%:25%
E4= Performance of Random Tree when, Attributes with noise + Percentage Split of
75%:25%
J48 – In J48, we ran four experiments, E1 to E4:
E1- 66 -34 split, without noise
E2- 66-34 split, with noise
E3- 75-25 split, without noise
E4- 75-25 split, with noise
18. INSY 5339 Principles of Business Data Mining Dr.Sikora
18
RANDOM TREE - In Random Tree, we ran four experiments, E1 to E4:
E1- 66 -34 split, without noise
E2- 66-34 split, with noise
E3- 75-25 split, without noise
E4- 75-25 split, with noise
E1- 66 -34 split, without noise
SEED CLASSIFIER PERCENTAGE
SPLIT
ACCURACY
1 Random Tree 66 72.9058
2 Random Tree 66 72.9058
3 Random Tree 66 72.9058
4 Random Tree 66 72.9058
5 Random Tree 66 72.9058
6 Random Tree 66 72.9058
7 Random Tree 66 72.9058
8 Random Tree 66 72.9058
9 Random Tree 66 72.9058
10 Random Tree 66 72.9058
AVERAGE 72.9058
VARIANCE 0
E2 – 66-34 split, with noise
SEED CLASSIFIER PERCENTAGE
SPLIT
ACCURACY
1 Random Tree 66 59.6546
2 Random Tree 66 59.6546
3 Random Tree 66 59.6546
4 Random Tree 66 59.6546
5 Random Tree 66 59.6546
6 Random Tree 66 59.6546
7 Random Tree 66 59.6546
8 Random Tree 66 59.6546
9 Random Tree 66 59.6546
10 Random Tree 66 59.6546
AVERAGE 59.6546
VARIANCE 0
19. INSY 5339 Principles of Business Data Mining Dr.Sikora
19
E3- 75-25 split, without noise
SEED CLASSIFIER PERCENTAGE
SPLIT
ACCURACY
1 Random Tree 75 73.3807
2 Random Tree 75 73.3807
3 Random Tree 75 73.3807
4 Random Tree 75 73.3807
5 Random Tree 75 73.3807
6 Random Tree 75 73.3807
7 Random Tree 75 73.3807
8 Random Tree 75 73.3807
9 Random Tree 75 73.3807
10 Random Tree 75 73.3807
AVERAGE 73.3807
VARIANCE 0
E4- 75-25 split, with noise
SEED CLASSIFIER PERCENTAGE
SPLIT
ACCURACY
1 Random Tree 75 59.4034
2 Random Tree 75 59.4034
3 Random Tree 75 59.4034
4 Random Tree 75 59.4034
5 Random Tree 75 59.4034
6 Random Tree 75 59.4034
7 Random Tree 75 59.4034
8 Random Tree 75 59.4034
9 Random Tree 75 59.4034
10 Random Tree 75 59.4034
AVERAGE 59.4034
VARIANCE 0
20. INSY 5339 Principles of Business Data Mining Dr.Sikora
20
4.2 RELATIVE ACCURACY OF EXPERIMENTAL DESIGN:
0 10 20 30 40 50 60 70 80
% Split - 66%
% Split - 75%
Random Tree
With Noise Without Noise
0
10
20
30
40
50
60
70
80
90
1 2 3 4
Accuracy vs Factors
J48 Decision Table Random Tree
21. INSY 5339 Principles of Business Data Mining Dr.Sikora
21
5. ROC CURVES:
The ROC curve is a fundamental tool for diagnostic test evaluation. The ROC curve is created
by plotting the true positive rate (TPR) against the false positive rate (FPR) at various
threshold settings.
5.1 ROC CURVE – KNOWLEDGE FLOW
Knowledge flow in Weka is used to create multiple ROC curves for different type of class
values against different classifiers.
• ArrfLoader component is used to load the data set into the knowledge flow.
• ClassAssigner component is used to choose the class attribute from the data set and
• ClassValuePicker component is used to choose a class value.
• TrainSetSplitMaker component is used because we are using Percentage split (66%)
on the Data set.
• The final list of classifiers are added and for each classifier a
ClassifierPerformanceEvaluator component is added to evaluate the classifiers.
• Finally, the ROC curve chart is taken using the PerformanceModelChart component.
The following graphs are obtained when we had taken the ROC Curves of the all
22. INSY 5339 Principles of Business Data Mining Dr.Sikora
22
5.2 SINGLE CLASS VS 3 CLASSIFIERS:
ROC CURVE – BENDING1 CLASS VS 3 CLASSIFIERS
ROC CURVE – BENDING2 CLASS VS 3 CLASSIFIERS
23. INSY 5339 Principles of Business Data Mining Dr.Sikora
23
ROC CURVE – CYCLING CLASS VS 3 CLASSIFIERS
ROC CURVE – LYING CLASS VS 3 CLASSIFIERS
24. INSY 5339 Principles of Business Data Mining Dr.Sikora
24
ROC CURVE – SITTING CLASS VS 3 CLASSIFIERS
ROC CURVE – STANDING CLASS VS 3 CLASSIFIERS
25. INSY 5339 Principles of Business Data Mining Dr.Sikora
25
ROC CURVE – WALKING CLASS VS 3 CLASSIFIERS
5.3 ALL CLASSES VS ALL CLASSIFIERS:
ROC CURVE – 7 CLASSES VS 3 CLASSIFIERS
26. INSY 5339 Principles of Business Data Mining Dr.Sikora
26
6. Principal Component Analysis:
To be sure, we have performed the Principal Component Analysis which has created a new
data set. Using PCA, we came up with new attributes whose values are functions of previous
attribute values. Tests performed on this data set have not yielded any improvement in the
accuracy with the introduction of new attributes. The results of PCA have been given as
below.
FULL TRAINING SET ACCURACY
Classifier Correctly Classified Instances(%)
Incorrectly Classified
Instances(%)
Zero R 17.0459 82.9541
One R 54.2106 45.7894
Naïve Bayes 64.2487 35.7515
J48 88.4964 11.5036
Decision Table 72.4118 27.5882
Random Tree 98.8612 1.1388
CROSS VALIDATION ACCURACY
Classifier Correctly Classified Instances(%)
Incorrectly Classified
Instances(%)
Zero R 17.0459 82.9541
One R 41.8973 58.1027
Naïve Bayes 64.1658 35.8342
J48 78.4536 21.5464
Decision Table 66.7606 33.2394
Random Tree 74.5306 25.4694
27. INSY 5339 Principles of Business Data Mining Dr.Sikora
27
PERCENTAGE SPLIT ACCURACY
Classifier Correctly Classified Instances(%)
Incorrectly Classified
Instances(%)
Zero R 17.0253 82.9747
One R 41.1531 58.8469
Naïve Bayes 63.5401 36.4599
J48 77.1743 22.8257
Decision Table 65.6013 34.3987
Random Tree 71.0673 28.9325
7. CONCLUSION:
Accuracy:
By looking at the accuracy of the classifiers, we were able to conclude that J48 has the best
accuracy – 77.1743 %.
Area under ROC:
By looking at the various ROC curves plotted against the 3 classifiers for the various class
values, J48 classifier seems to be more efficient because it has a large area under the curve
compared to the curves of Decision Table and Random Tree. And for the class attribute
which is the Activity, the LYING class seems to have produced accurate results because it has
a large area under the curve compared to the curves of the remaining class values.
Experimental Design:
The results from Experimental Design shows that by including 10 % Noise to our dataset,
there was an approximate dip of 8-9% in accuracy.
Overall:
The previously concluded results on the original dataset hold in this case, i.e. taking into
consideration the results of factorial experimental design, test on full training set data, cross
validation and percentage split test, the algorithm J48 YIELDED MAXIMUM ACCURACY.
Results obtained from ROC curves also show that J48 classifier gives the maximum
prediction accuracy. Also, the area under ROC Curve for J48 classifier tends to be the
maximum.
28. INSY 5339 Principles of Business Data Mining Dr.Sikora
28
8. REFERENCES:
• Human activity recognition using multisensor data fusion based on Reservoir
Computing, Journal of Ambient Intelligence and Smart Environments, 2016 by F.
Palumbo, C. Gallicchio, R. Pucci and A. Micheli
https://www.researchgate.net/publication/298911566_Human_activity_recognition
_using_multisensor_data_fusion_based_on_Reservoir_Computing
• Multisensor data fusion for activity recognition based on reservoir computing, in:
Evaluating AAL Systems Through Competitive Benchmarking, Communications in
Computer and Information Science by F. Palumbo, P. Barsocchi, C. Gallicchio, S.
Chessa and A. Micheli
https://www.researchgate.net/publication/258029665_Multisensor_Data_Fusion_fo
r_Activity_Recognition_Based_on_Reservoir_Computing