SlideShare a Scribd company logo
1 of 15
Download to read offline
DDSGA: A Data-Driven Semi-Global Alignment
Approach for Detecting Masquerade Attacks
Hisham A. Kholidy, Fabrizio Baiardi, and Salim Hariri, Member, IEEE Computer Society
Abstract—A masquerade attacker impersonates a legal user to utilize the user services and privileges. The semi-global alignment
algorithm (SGA) is one of the most effective and efficient techniques to detect these attacks but it has not reached yet the accuracy and
performance required by large scale, multiuser systems. To improve both the effectiveness and the performances of this algorithm, we
propose the Data-Driven Semi-Global Alignment, DDSGA approach. From the security effectiveness view point, DDSGA improves the
scoring systems by adopting distinct alignment parameters for each user. Furthermore, it tolerates small mutations in user command
sequences by allowing small changes in the low-level representation of the commands functionality. It also adapts to changes in the
user behaviour by updating the signature of a user according to its current behaviour. To optimize the runtime overhead, DDSGA
minimizes the alignment overhead and parallelizes the detection and the update. After describing the DDSGA phases, we present the
experimental results that show that DDSGA achieves a high hit ratio of 88.4 percent with a low false positive rate of 1.7 percent. It
improves the hit ratio of the enhanced SGA by about 21.9 percent and reduces Maxion-Townsend cost by 22.5 percent. Hence,
DDSGA results in improving both the hit ratio and false positive rates with an acceptable computational overhead.
Index Terms—Masquerade detection, sequence alignment, security, intrusion detection, attacks
Ç
1 INTRODUCTION
A masquerader is an attacker who authenticates as a
legal user by stealing its credentials or by violating the
authentication service. An insider masquerader is a legal
system user that misuses his/her privileges to access dis-
tinct accounts and perform unauthorized actions. An out-
sider aims to utilize all the privileges of a legal user.
Alternative implementations of this attack [1] do exist, such
as duplication or ex-filtration of user password, installation
of software with backdoors or malicious code, eavesdrop-
ping and packet sniffing, spoofing and social engineering
attacks. These attacks may leave some trail in log files that,
after the fact, can be linked to some user. In this case, a log
analysis by a host-based IDS remains the state-of-the art to
detect these attacks. Attacks that do not leave an audit trail
in the target system may be discovered by analyzing the
user behaviors through masquerade detection. At first, mas-
querade detection builds a profile for each user by gathering
information such as login time, location, session duration,
CPU time, commands issued, user ID and user IP address.
Then, it compares these profiles against logs and signals as
an attack any behaviour that does not match the profile. The
current detection approaches have not achieved the level of
accuracy and performance for practical deployment in spite
of the large amount of information they used to build a pro-
file such as command line commands, system calls, mouse
movements, opened files names, opened windows title, and
network actions. Semi-global alignment (SGA) [2] is one of
the most efficient detection algorithms and its accuracy was
improved by Coull et al. [3]. We refer to this new improve-
ment as “Enhanced-SGA”. This paper introduces the Data-
Driven Semi-Global Alignment (DDSGA) approach, which
improves both the detection accuracy and the computa-
tional performance of the Enhanced-SGA and of HSGAA
[4] that is also based upon SGA. The main idea underlying
DDSGA is to consider the best alignment of the active ses-
sion sequence to the recorded sequences of the same user.
After discovering the misalignment areas, we label them as
anomalous and several anomalous areas are a strong indica-
tor of a masquerade attack. DDSGA improves the security
efficiency by using not only lexical matching such as string
matching or longest common substring searches, but also
by tolerating small mutations in the sequences with small
changes in the low-level representation of the user com-
mands. To this purpose, a command can be aligned with
one that implements the same functionalities. To increase
the hit ratio and reduce both false positive and false nega-
tive rates, DDSGA pairs each user with distinct gap inser-
tion penalties according to the user behavior. Furthermore,
it improves both the alignment scoring system and the
update phase of Enhanced-SGA to tolerate changes in
behaviours without significantly reducing the alignment
score. To reduce both the runtime overhead and the mas-
querader live time inside the system, DDSGA implements
the detection and update operations in parallel threads and
simplifies the alignment. After highlighting the SEA dataset
[5] that we use to compare DDSGA against other
 H.A. Kholidy is with the Faculty of Computers and Information,
Fayoum University, Fayoum, Egypt, the Department of Computer Sci-
ence and Engineering, Qatar University, Doha, Qatar, the NSF Cloud
and Autonomic Computing Center, College of Electrical and Computer
Engineering, University of Arizona, Tucson, Arizona, and the Dipar-
timento di Informatica, Universita di Pisa, Pisa, Italy.
E-mail: hisham_dev@yahoo.com.
 F. Baiardi is with the Department of Computer Science, Pisa University,
Pisa 56017, Tuscany, Italy. E-mail: baiardi@di.unipi.it.
 S. Hariri is with the NSF Cloud and Autonomic Computing Center,
College of Electrical and Computer Engineering, University of Arizona,
Tucson, Arizona. E-mail: hariri@ece.arizona.edu.
Manuscript received 3 June 2012; revised 27 Jan. 2014; accepted 2 May 2014.
Date of publication 2 June 2014; date of current version 13 Mar. 2015.
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below.
Digital Object Identifier no. 10.1109/TDSC.2014.2327966
164 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 2, MARCH/APRIL 2015
1545-5971 ß 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
techniques, Section 2 briefly reviews masquerade detection.
Section 3 discusses the SGA algorithm. Section 4 introduces
DDSGA and describes its three main phases namely, config-
uration, detection, and update. It also describes the modules
and experimental results for each phase and compares them
against other approaches. Lastly, Section 5 draws some con-
clusion and outlines future work.
1.1 SEA Dataset
Currently, five data sets may be used to evaluate masquerade
detection techniques: SEA, Greenberg, Purdue, RUU, and
our CIDD dataset [5], [7], [8], [9], and [10]. The SEA dataset
has become the de-facto standard because it has been used by
many researchers working on masquerade detection. In prin-
ciple, this simplifies the comparison of distinct approaches.
SEA consists of commands and user names collected from
UNIX acct audit data. The data describe 50 different users
each issuing 15,000 commands. The first 5,000 commands of
each user are assumed to be genuine. The remaining com-
mands of a user are splitted into 100 blocks of 100 commands
each. Each block maybe seeded with masquerade users, i.e.
with commands of other users. There is a 1 percent probabil-
ity that a block is a masquerader and, in this case, there is a
probability of 80 percent that the next one is a masquerader
as well. As a result, approximately 5 percent of the test data
are masquerades. One of the most critical defects of SEA is
that it neglects command arguments and parameters. Due to
the way acct collects audit data, it is difficult to distinguish
commands typed by human users from those generated by
shell scripts. Due to the repetitive execution of shell scripts,
for some users this may result in a very regular pattern with a
few commands. To address some methodological shortcom-
ings, two alternative configurations of SEA, namely 1v49 [11]
and SEA-I [12], have been defined but they have not been
widely used and do not truthfully represent real world mas-
querader activities.
2 RELATED WORK IN MASQUERADE DETECTION
We briefly outline some masquerade detection approaches.
The uniqueness approach [6] assumes that commands that
have not been seen in the training data indicate a masquer-
ader. Moreover, the probability that a masquerader has
issued a command is inversely related to the number of
users that use such a command. While uniqueness has a
relatively poor performance, it is one of the few
approaches that target false alarm rate of 1 percent. Na€ıve
Bayes One-step Markov [13] is based upon one-step transi-
tions from a command to the next. It builds two transition
matrices for each user from, respectively, the training data-
base and the testing one and it triggers an alarm when
these matrices noticeably differ. The false alarm rate of this
method is not satisfactory. The Hybrid Multi-Step Markov
method [14] is based on Markov chains. When a Markov
model cannot be adopted because too many commands in
the testing data have not been observed in the training, a
simple independence model with probabilities estimated
from a contingency table of users versus commands may
be more appropriate. Schonlau et al. [6] toggled between a
Markov model and the simple independence one. This
approach achieves the best performance among the
considered methods. The main idea underlying the com-
pression approach [6] is that new and old data from the
same user should compress at about the same ratio.
Instead, data from a masquerading user will compress at a
different ratio. Among the proposed methods, this results
in the worst performance. Incremental Probabilistic Action
Modeling (IPAM) [15] is based upon one-step command
transition. It estimates the probability of each transition
from the training data set and uses it to predict the
sequence of user commands. Too many false predictions
signal a masquerader. This method is in the lowest-per-
forming group. Sequence-matching [16] computes a simi-
larity match between the user profiles and the
corresponding sequence of commands. Any score lower
than a threshold signals a masquerader. Its performance
on the SEA data set is not very high. Support Vector
Machine (SVM) [17] denotes a set of machine learning
algorithms for binary data classification. It exploits a set of
support vectors in the training data that outlines a hyper-
plane in feature space [17]. SVM can potentially learn a
large set of patterns but it results in high false alarm rates
and a low detection rate. Furthermore, the user profile has
to be updated to reduce false alarms. Szymanski and
Zhang [18] propose a recursive data mining approach that
discovers frequent patterns in the sequence of user com-
mands, encodes them with unique symbols, and rewrites
the sequence with the new coding. Then, a one-class SVM
classifier detects masqueraders. This approach demands
mixing user data and may not be ideal or easily imple-
mented in real-world. It also suffers of some of the SVM
shortcomings. Maxion and Townsend [11] applied a Na€ıve
Bayes classifier widely used in text classification tasks and
that classifies sequences of user-command data into either
legitimate or masquerader. The method has not yet
achieved the level of accuracy for practical deployment.
Dash et al. [19] introduced an episode based Na€ıve Bayes
technique that extracts meaningful episodes from a long
sequence of commands. The Na€ıve Bayes algorithm identi-
fies these episodes either as masquerade or normal accord-
ing to the number of commands in masquerade blocks.
The proposed technique significantly improves the hit ratio
but it still has high false positive rates and it does not
update the user profile. Alok et al. [20] integrates a Na€ıve
Bayes approach with one based on a weighted radial basis
function, WRBF, similarity. The Na€ıve Bayes algorithm
includes information on the probabilities of commands by
one user over the other users. Instead, the WRBF similarity
takes into account the similarity measure based on the fre-
quency of commands, f, and the weight associated with the
frequency vectors. Here, f is a similarity score between an
input frequency vector and a frequency vector from the
training data set. The experiments confirm that WRBF-NB
significantly improves the hit ratio but, as the previous
approach, it suffers from the high false positive rates. Fur-
thermore, it increases the overall overhead by computing
both the Na€ıve Bayes and the WRBF and integrating their
results. Lastly, it does not update the user profile and
neglects the low level representation of user commands.
Dash et al. [21] introduced an adaptive Na€ıve Bayes
approach based on the premise that both the commands of
a legitimate user and those of an attacker may differ from
KHOLIDY ET AL.: DDSGA: A DATA-DRIVEN SEMI-GLOBAL ALIGNMENT APPROACH FOR DETECTING MASQUERADE ATTACKS 165
the trained signature but the deviation of the legitimate
user is momentary, whereas the attacker one persists lon-
ger.The improvement in the performance of detection has
been empirically verified using several data sets. However,
the false positive rate is still high. Malek and Salvatore [22]
have modeled user OS commands as bag-of-words without
timing information. They used a one-class support vector
machine to achieve a better performance than threshold-
based comparison with a distance metric. The ability of
sequence alignment algorithms to find areas of similarity
can be exploited to differentiate legitimate usage from mas-
querade attacks. To do so, a signature of the normal user
behavior should be created by collecting sequences of audit
data. Then, this signature is aligned with audit data from
monitored sessions to find areas of similarity. Areas that
do not align properly are assumed to be anomalous, and
several anomalous areas are a strong indicator of masquer-
ade attacks [2]. Among sequence alignment algorithms
such as global, local and semi-global alignments, the most
efficient one is semi-global alignment. Adesina et al. [23]
modified the scoring function of the semi-global alignment
algorithm to improve the detection efficiency. They used a
systematically generated ASCII coded sequence audit data
from Windows and UNIX systems as simulations for the
intrusion data set. However, a real time evaluation using
one of the current data sets is missing.
Coull et al. [2] modified the Smith-Waterman align-
ment algorithm [24] to a semi-global alignment algorithm
that is described with some evolutions and enhancements
[3] in the next section. Fig. 1 compares different techni-
ques in terms of the Receiver Operator Characteristic
(ROC) curves [3], [25] and the Maxion-Townsend cost
function [11] and it shows that SGA achieves the best
performances. The Maxion-Townsend cost function rates
a masquerade detection algorithm and defines the detec-
tion cost as in Eq. (1).
Cost ¼ 6 Ã False Positive Rate þ ð100 À HitRatioÞ: (1)
3 SGA AND THE ENHANCED-SGA
This section describes in some details SGA and some pro-
posals to enhance it.
SGA is more accurate and efficient than current
approaches. It has low false positive and missing alarm
rates and high hit ratio. It can be adopted in heterogeneous
environment with distinct operating system because it can
be applied to distinct audit data such as command line
entries, mouse movements, system calls, registry events, file
and folder names, sequence of opened windows titles and
network access audit data. SGA aligns large sequence areas
as in global alignments, while preserving the nature of local
alignments. It can ignore both prefixes and suffixes and it
only aligns the conserved area with the maximal similarity.
Fig. 2 shows an application of SGA and the influential
parameters of an alignment namely: match score, mismatch
score, test_gap penalty, signature_gap penalty, and detec-
tion threshold.
To discover the optimal alignment, SGA exploits
dynamic programming. To this purpose, it initializes an
m þ 1 by n þ 1 score matrix, M, and then determines the
value of each position of M by one of three transitions, see
Fig. 3:
1) Diagonal transition. It aligns the i À 1 symbol in the
signature sequence with the j À 1 symbol in the test
sequence. The alignment score depends upon the
lexical match of the symbols being aligned and it is
added to Mði À 1; j À 1Þ.
2) Vertical transition. A gap is inserted into the signa-
ture sequence and it is aligned with the jÀ1 sym-
bol in the test sequence. The gap penalty is added
to Mði; j À 1Þ.
3) Horizontal transition. A gap is inserted into the test
sequence and aligned with the iÀ1 symbol in the
Fig. 1. ROC curves for detection techniques that use SEA Dataset.
Fig. 2. An alignment example using SGA algorithm.
Fig. 3. The three transitions to fill each cell in the transition-matrix.
166 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 2, MARCH/APRIL 2015
signature sequence. The gap penalty is added to
Mði À 1; jÞ.
The SGA scoring system determines the gap penalties in
transitions 2 and 3. The actual alignment depends upon the
maximum value of the three transitions and its value is
assigned to M(i, j). M(i, j) is the score of the optimal align-
ment of all symbols up to location iÀ1 in the signature
sequence and jÀ1 in the test one. Hence, M(m, n) as the
score of the optimal alignment of the two sequences. We
can rebuild this alignment by tracing back the transitions
that have produced the score. M(m, n) measures the similar-
ity of the two sequences according to the scoring system
used and it is an indicator of masquerade attacks.
3.1 The Enhanced-SGA
Coull and Syzmanski [3] modified the SGA algorithm to
handle the problems of the traditional Smith-Waterman
alignment algorithm from two perspectives. The first one
considers that the usage patterns of legal users may
change due to changes in their role or to new software. A
static user signature is therefore prone to label as attacks
some variations of legal users. To avoid these false
positives, the signature is updated as new behaviour is
encountered by exploiting the ability of SGA of discover-
ing areas of similarity.
Furthermore, as discussed in Section 4.2.3, they defined
two scoring systems, the command grouping and binary
scoring systems, to set the alignment scores and the gap
insertion penalties. The signature update scheme is applied
with the binary scoring, their most efficient system. This
scheme augments both the current signature sequence with
information on the new behaviours and the user lexicon
with the new commands the user invokes. The scheme also
introduces a threshold for each user profile to ensure that
both the signature sequence and user lexicon remain free of
tainted commands from masquerade attacks. The threshold
is used in both detection and update processes, and it is
built through a snapshot of the user signatures. The other
perspective considers that the Smith-Waterman algorithm
is computationally expensive and impractical to detect mas-
querade attacks on multi-user systems. By selectively align-
ing only the portions of the user signature with the highest
success probability, Heuristic Aligning [3] can significantly
reduce the computational overhead with almost no loss of
accuracy in detection. These modifications have been tested
on the SEA data set to simplify the comparison with other
approaches.
4 THE DATA-DRIVEN SEMI-GLOBAL ALIGNMENT
APPROACH
DDSGA is a masquerade detection approach based upon
Enhanced-SGA [3]. It aligns the user active session
sequence to the previous ones of the same user and it
labels the misalignment areas as anomalous. A masquer-
ade attack is signaled if the percentage of anomalous areas
is larger than a dynamic, user dependent threshold.
DDSGA can tolerate small mutations in the user sequences
with small changes in the low level representation of user
commands and it is decomposed into a configuration
phase, a detection phase and an update one. The
configuration phase, computes, for each user, the align-
ment parameters to be used by both the detection and
update phases. The detection phase aligns the user current
session to the signature sequence. The computational per-
formance of this phase is improved by two approaches
namely the Top-Matching Based Overlapping (TMBO) and
the parallelized approach. In the update phase, DDSGA
extends both the user signatures and user lexicon list with
the new patterns to reconfigure the system parameters.
Fig. 4 shows these phases and the modules that implement
them that we discuss in the following.
4.1 DDSGA Main Features and Improvements
DDSGA improves both the computational and the security
efficiency of the Enhanced-SGA.
From a computational perspective, DDSGA improves the
performance of both the detection and the update through a
parallel multithreading scheme and a new Top-Matching
Based Overlapping approach that improves the Heuristic
Aligning and saves computational resources. While the
Heuristic Aligning splits the signature sequence into a fixed
overlapped subsequences of size 2n, where n is the size of
the test sequence, TMBO simplifies the alignment through
shorter overlapped subsequences. Besides saving computa-
tional resources, this speeds up the detection and update
phases and consequently reduces the masquerader live
time inside the system.
With respect to the accuracy of masquerade detection,
DDSA introduces distinct scoring parameters for each user,
namely the gap penalties and the overlapping length. The
adoption of distinct scoring parameters for each user
improves the detection accuracy, the false positive and false
negative rates and increase the detection hit ratio with
respect to the traditional and enhanced SGA that use the
same parameters for any user. This neglects differences
among the behaviours of distinct users and reduces the
accuracy of detection, because the alignment cannot tolerate
even slight changes in the user behaviour over time. Start-
ing from the data of each user, the configuration phase of
DDSGA computes the scoring parameters that result in the
maximum alignment score for the considered user. To
improve the accuracy of the alignment, DDSGA integrates
binary and command group, two scoring systems suggested
by Coull et al.’s, into two other scoring systems, restricted
and free permutation. The resulting systems tolerate permu-
tations of previously observed patterns with a low
Fig. 4. DDSGA Phases and modules.
KHOLIDY ET AL.: DDSGA: A DATA-DRIVEN SEMI-GLOBAL ALIGNMENT APPROACH FOR DETECTING MASQUERADE ATTACKS 167
reduction of the score. To tolerate changes in the low-level
representation of commands with the same functionality,
they classify user commands into several groups and align
two commands in the same group without reducing the
alignment score. The DDSGA configuration phase also cre-
ates a dynamic threshold for each user to be used by both
the detection phase and the update one. While Enhanced-
SGA builds this threshold through a snapshot of a user
profile, DDSGA builds a more sensitive and dynamic
threshold by considering any data in the profile. Further-
more, DDSGA runs two update modules: the inline and
long term modules. The inline module updates the user sig-
nature patterns, the user lexicon list, and their correspond-
ing command categories in a reconfiguration phase. The
long-term module updates the system with the latest
changes in the alignment parameters. It also updates the
dynamic threshold values, scoring parameters, and the
overlapping length i.e., the length of the overlapped signa-
ture subsequence according to maximum number of
inserted test gaps. The dynamic threshold, the scoring sys-
tems, and the two update modules enable DDSGA to toler-
ate slight changes in the user behaviour overtime. Table 1
briefly compares DDSGA and Enhanced-SGA.
In the following sections we detail the current implemen-
tation of DDSGA.
4.2 The Configuration Phase
We briefly define the parameters that this phase computes
for each user and that are used by the following phases. The
detailed calculation of each parameter is described in the
remainder of this section.
 Optimal gap penalties. The optimal test gap penalty
and the optimal signature gap penalty are paid
when inserting a gap into the test sequence and the
signature one respectively. While in Enhanced-SGA
all the users share the same fixed penalties, DDSGA
computes two distinct penalties for each user accord-
ing to distinct user behaviours. In this way, DDSGA
determines the smallest penalties corresponding to
the maximum alignment score.
 Mismatch score. DDSGA evaluates the mismatch
score through the restricted permutation scoring
system and the free permutation one. These systems
integrate the command grouping and binary scoring
systems defined in [3].
 Average optimal threshold. DDSGA computes a dis-
tinct threshold value for each user according to the
changes in the user behavior. The threshold is used
in both the detection and update phases and its sen-
sitivity affects the accuracy of both phases, as dis-
cussed in the average threshold module.
 Maximum factor of test gaps (mftg). This parameter
relates the largest number of gaps inserted into the
user test sequences to the length of these sequences.
DDSGA computes a distinct parameter for each user
and updates it in the update phase. The detection
phase uses the parameter to evaluate the maximum
length of the overlapped signature subsequences in
the TMBO.
The configuration phase is implemented using the fol-
lowing five modules.
4.2.1 DDSGA Initialization Module
To provide an independent set of test and signature sequen-
ces for the configuration phase of each user, we split the
user signatures into nt non-overlapped-blocks each of
length n and use them as test sequences to the user. These
sequences represent all given combinations of users signa-
ture sequences and all the modules in the configuration
phase use them to compute the user alignment parameters.
These sequences differ from those used in the detection
phase. To define the signature sequences, we divide the
user signature sequence into a set of overlapped groups of
length m ¼ 2n. In this way, the last n symbols of a block
also appear as the first n of the next one. ns, the number of
signature subsequences is equal to nt-1 groups to consider
all possible adjacent pairs of the signature sequences of size
n. We have chosen a length 2n to overlap the signature
sequence because any particular alignment uses subsequen-
ces with a length that is, at most, 2n. Any longer subse-
quence necessary scores poorly, because of the number of
gaps to be inserted. In fact, since the scoring alignment
depends upon the match between the test and the signature
subsequences, the former should be shorter than the latter.
As a consequence, the signature sequence for this phase
consists of 2n command produced by overlapping the sig-
nature sequence. Hence, there are nt-1 groups that are cre-
ated as in Fig. 5.
In the case of SEA, ns ¼ 49 subsequences and a tested
block consists of 100 commands because SEA marks
each 100 command block as an intrusion or a non-intru-
sion. Since SEA does not supply any information on
which commands in a block correspond to the intrusion,
the correctness of larger or smaller blocks cannot be
checked. The dynamic average threshold for both the
detection and update phases is the average score of all
the alignments between each test sequence of length 100-
command and the 49 overlapped 2n signature subse-
quences. A test sequence is not aligned with the signa-
ture subsequence that contains the test sequence itself
because this returns a 100 percent alignment score. We
will detail this step in the average threshold module.
TABLE 1
A Comparison between DDSGA and Enhanced-SGA
Enhanced-SGA DDSGA
Accuracy 1- Command group-
ing and binary scor-
ing systems
2- Fixed and equal
gab insertion penal-
ties for all the users.
3- Compute the
threshold for each
user using snapshots
of user data.
4- Signature update.
1- Free and restricted per-
mutation scoring systems
2- Define gab insertion
penalties for each user
independently, based on
user previous data.
3- Compute an optimal
threshold for each user
with all user signature
sequences.
4- Signature update (inline
and long term).
Computational
Performance
Heuristic Aligning. 1-TMBO Approach.
2-Parallel computation.
168 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 2, MARCH/APRIL 2015
4.2.2 User’s Lexicon Categorization Module
This module builds a lexicon for each user, i.e. list of lexical
patterns classified according to their functionality and that
is used to tolerate changes in the low level representation of
a pattern. In SEA data set, these patterns are UNIX com-
mands. This module combines the user lexicon list and com-
mand grouping approach introduced in [3].
Table 2 shows an example of a classification of user com-
mands into several categories according to their functionali-
ties e.g., since a user can use either cat or vi to write a file,
the two commands can be aligned because both belong to
the same group, “Text processing”. In the same way, grep
can be aligned with find because they both belong to
“searching”.
4.2.3 Scoring Parameters Module
Starting from the test and signature subsequences of each
user, this module returns three parameters: optimal test gap
penalty, optimal signature gap penalty, and mismatch score.
At first, the module inserts into the list top_match_list all
the test sequences with the top match score. This list enables
DDSGA to align the top match test sequences only rather
than all the nt sequences. To build the top_match_list, we
select the highest match scores for all the nt sequences. The
match score MS of a test sequence is computed as in Eq. (2).
Then, the top_match_list sequences are aligned to the ns
overlapped signature subsequences using any possible gap
penalty, i.e. the test gap penalty ranges from 1 to n, while
the signature gap penalty ranges from 1 to n. The mismatch
score is 0 and the match score is þ2.
MS ¼
Xn
i¼1
Xnt
k¼1
MinðNoccur Itself pið Þ; Noccur seqkðpiÞÞ; (2)
where:
- n is the length of the test sequence, nt is number of
test sequences,
- Noccur_Itself (pi) is number of occurrence of pattern
i in the current evaluated sequence,
- Noccur_Seq k(pi) is number of occurrence of pattern
i in test sequence k.
By computing each alignment separately, we produce sev-
eral alignment scores for each combination of the scoring
parameters and select as the optimal parameters those
resulting in the maximum score. If several alignments result
in this score, we select the one with the smallest penalties of
inserting a gap into the signature subsequence and test one,
respectively. To justify this solution, consider that the high-
est alignment score denotes a high level of alignment
between the test and the signature subsequences and SGA
normally subtracts these penalties from the score. We
have applied the previous steps to each user in SEA data
set to define the corresponding optimal penalties. Fig. 6
and Table 3 show the penalties for user 1.
This algorithm computes the mismatch_score parame-
ter using the restricted and the free permutation scoring
systems as shown in Figs. 7a and 7b. Both systems inte-
grate command grouping and binary scoring [3] and are
Fig. 5. The non-overlapped test sequences and the overlapped signa-
ture subsequences.
TABLE 2
User 1 Lexicon List
User’s Lexicon List Functional Group
cat File system, Text processing
vi Text processing
sh Shell programming
env User environment
kill Process Management
reaper Networking
mail Internet Applications
hostname System Information
nawk Development, Shell
Programming
getopt Development, Shell
Programming
grep Searching, Text processing
find Searching
Fig. 6. The best alignment score that corresponds to the optimal combi-
nations of gap penalties for user 1 in SEA Dataset.
TABLE 3
Example of the Top Match Scores of User 1
Max align-
ment score
Test
sequence
index (i)
Signature
gap penalty
Test gap
penalty
Optimal
combina-
tion
154 2 99 95 -
154 5 100 95 -
154 8 97 100 -
154 13 97 99 ok
KHOLIDY ET AL.: DDSGA: A DATA-DRIVEN SEMI-GLOBAL ALIGNMENT APPROACH FOR DETECTING MASQUERADE ATTACKS 169
based on distinct assumption about mutations in the audit
data. Command grouping assumes that sequences of audit
data mutate by replacing some original symbols with func-
tionally similar ones. Command grouping assigns a static
reward of +2 to exact matches and scores a mismatch
through the functional groups of the two commands. If a
command in the signature aligns with a mismatched com-
mand in the test sequence but that belongs to the same
group, the mismatch score is set to +1 rather than to À1. The
assumption on mutation of the binary scoring system fol-
lows the results in [26] where mutation does not replace
base symbols in the user lexicon. In fact, these symbols are a
strong indicator of a legal use, but the original base symbols
can be permuted in some fashion [3]. The binary scoring
system rewards exact matches by adding +2 to the align-
ment score. A mismatch is scored to +1 if the mismatched
command has previously occurred in the user lexicon and
to À1 otherwise. Both scoring systems penalize the insertion
of a gap into the signature sequence and into the test one
by, respectively, À3 and À2.
Restricted permutation scoring system. It rewards a mis-
matched command in the test sequence if the two commands
belong to the same group. These groups are manually cre-
ated from a set of common UNIX commands in the signature
sequences. Furthermore, the mismatched command of the
test sequence should have previously occurred in the user
lexicon. This tolerates various permutations of previously
observed patterns without reducing the score significantly.
Free permutation scoring system. This system is more toler-
ant than the previous one because it does not require that
the mismatched commands belong to the same group and it
even rewards a mismatched command provided that it
belongs to the user lexicon. This tolerates a larger number
of permutations of the signature patterns without reducing
the score significantly.
4.2.4 Average Threshold Module
This module computes a dynamic average threshold for
each user to be used in the detection phase and that may be
updated in the update phase. In the detection phase, if thez
alignment score is lower than the threshold, then the
behavior is classified as a masquerade attack. While
the Enhanced-SGA only considers snapshots of the user
data, this module considers all the user data. This improves
the sensitivity of the threshold that, in turns, determines the
one of the detection. The module uses the same test and sig-
nature subsequences of the initialization module and it can
work with a test sequence of any length because, in practical
deployment user test session can be of any length. At first,
the module builds a trace-matrix, to record all alignment
details of the current user. We align each test sequence to all
ns overlapped signature subsequences and run the module
twice, one for each scoring systems to select the one to be
used. The detection phase uses the output of this module to
compare the two scoring systems. These alignments use the
optimal scoring parameters for the current user. The mod-
ule applies Eq. (3) to compute the average alignment of test
sequence i, avg_align_i, and the sub_average score for all
previous alignment scores, score_align_i, of test sequence i.
In the equation, max_score_align_i is the largest alignment
score resulting from the alignment of sequence i to all ns sig-
nature subsequences. Then, the module applies Eq. (4) to
compute the detection_update_threshold, the overall aver-
age of the nt sub average scores.
avg align i ¼
Pns
j¼1 score align ij
 
=ns
max score align i
à 100 (3)
detection update threshold ¼
Pnt
k¼1 avg align ik
nt
(4)
To compute the optimal alignment path and number of
test gaps (ntg) to be inserted into test sequence i, we apply
the trace backward algorithm (TBA) to trace back the transi-
tions to derive each optimal score. An example is shown in
Table 4.
The Trace backward algorithm. Any dynamic programming
algorithm can select the optimal alignment path by tracing
back through the optimal scores computed by SGA. The
TBA implements one of the known dynamic programming
algorithms to build the backward-transition-matrix, see
Fig. 8, in a way that helps in filling the trace-matrix in
Fig. 7. (a) The restricted permutation scoring system, (b) The free per-
mutation scoring system.
170 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 2, MARCH/APRIL 2015
Table 4 as previously discussed. The alignment process also
applies the TBA to extract the final alignment path. The
TBA traces back the transition-matrix according to the
labels that the alignment has inserted into this matrix. One
of four labels may be inserted: “M” if a match has occurred,
“!M” if a mismatch has occurred, “GS” or “GT” if a gap has
been inserted into a signature subsequence or into the test
one. Fig. 8 outlines the TBA and shows the transition-matrix
and the corresponding backward-transition-matrix to align
the test sequence “AWGHE” to the signature sequence
“PAWHE”.
4.2.5 Maximum Test Gap Module
We recall that the Enhanced-SGA Heuristic Aligning
decomposes the signature subsequence into 2n overlapped
subsequences because if subsequences of length n are
aligned, the maximum number of gaps that can be inserted
into the test sequence is n for all users. By tracing the SGA
algorithm, we have noticed that the maximum number of
gaps is much lower than n, the length of the test sequence,
and it differs for each user according to the level of similar-
ity among the subsequences in the user signature and to the
length of the test sequence. Even if the test sequence is long
enough, the number of gaps is at most half of the sequence
length. This means that by partitioning the signature
sequence into 2n overlapped subsequences, the Maximum
Test Gap module can divide it as in Eq. (5) to compute mftg,
the largest number of test gaps inserted into the user test
sequences by the average threshold module. The Computa-
tional Enhancement (CE) for the session alignment of a user
can be computed according to Eq. (6). We refer to the detec-
tion phase for an example that explains this section.
L ¼ n þ Max
nt
k ¼ 1
ntgk
ltsk
  
à n
 
; (5)
where:
- ntg is number of test gaps inserted to each test sequence,
- lts is the length of the test sequence,
- nt is number of the test sequences of the user, fifty in case
of SEA,
CE ¼ ðn À mftgd eÞ Ã ð100=ð2 Ã nÞÞ% (6)
Where, Maximum Factor of Test Gaps inserted into all user
test sequences (mftg) is computed as in Eq. (7).
mftg ¼ Max
nt
k ¼ 1
ntgk
ltsk
  
: (7)
4.3 The Detection Phase
We have run a complete alignment experiment based upon
the test and signature blocks of the SEA data set to evaluate
the alignment parameters and the two scoring systems. The
test blocks are the actual SEA testing data and they differ
from those described in the initialization module. To simplify
a comparison with other approaches, we use the ROC curve
and the Maxion-Townsend cost function defined in Eq. (1).
Our experimentation focuses on the effects of the align-
ment parameters on the false positive and false negative
rates and on the hit ratio. This experiment did not apply the
maximum test gap module. False positives, false negatives
and hits are computed for each user, transformed into the
corresponding rates that are then summed and averaged
over all 50 users. Equations (8), (9), and (10) show the
DDSGA metrics.
TotalFalsePositive ¼
Xnu
k¼1
fpk
nk
!
nu
!
à 100 (8)
Where:
- fp ¼ No. of false positive alarms,
- n ¼ No. of non-intrusion command sequence blocks,
- nu ¼ No. of users (50 in our case)
TotalFalseNegative ¼
Xnui
k¼1
fnk
nik
!
nui
!
à 100 (9)
Where:
- fn ¼ No. of false negatives,
- ni ¼ No. of intrusion command sequence blocks,
- nui ¼ No. of users who have at least one intrusion block
Fig. 8. The transition and Backward_Transition matrices respectively.
The thick and the thin arrows show, respectively, the optimal path that
leads to the maximum alignment score and those that do not lead to the
maximum alignment score.
TABLE 4
An Example for the Trace Matrix
Test
sequence
ID
Length of
test
sequence
(lts)
Signature sub-
sequence ID
Optimal
alignment
Number of
Test gaps (ntg)
Avg-align
of test
sequence i (%)
1 100 1 27 65
..
. ..
. ..
. ..
. ..
.
1 100 49 77 22 70.31
..
. ..
. ..
. ..
. ..
. ..
.
50 100 1 122 9
..
. ..
. ..
. ..
. ..
.
50 100 49 102 11 73.75
KHOLIDY ET AL.: DDSGA: A DATA-DRIVEN SEMI-GLOBAL ALIGNMENT APPROACH FOR DETECTING MASQUERADE ATTACKS 171
TotalHitRatio ¼ 100 À TotalFalseNegative: (10)
To plot the ROC curve, the experiment with the tradi-
tional SGA algorithm have used distinct values of the align-
ment parameters to obtain different false positive rates in
the x-axis and the corresponding hit ratios in y-axis. We
also repeat the experiment with distinct values of some
alignment parameters such as reward for matches and
rewards or penalties for mismatches computed by the two
scoring systems. Fig. 9 shows the ROC curve for the two
scoring systems, the Enhanced-SGA scoring systems, and
other detection approaches that also use SEA data set, based
upon the previous metrics.
According to this figure, the restricted permutation sys-
tem results in higher hit ratio with corresponding low false
positive rates. As the false positives rate increases, the free
permutation system achieves a higher hit ratio than the
restricted one because it can tolerate a large number of
mutations and deviations in user behaviours. According to
this experiment, we have adopted the restricted permuta-
tion system as a suitable scoring system for all the phases
of DDSGA. Table 5 compares DDSGA using the restricted
and the free permutation scoring systems against the cur-
rent masquerade detection approaches sorted by Maxion-
Townsend cost. The results for the various detection
approaches, including all ROC curve values, are published
in the references shown in Table 5 and are sorted by
“Maxion Townsend cost” that is used to siplify the compar-
ison by considering both the false positive and hit ratio. We
re-implemented all Coul et al. works in [2], [3] to compare
it against DDSGA.
The computational enhancement modules. The computa-
tional complexity of SGA is rather large. As an example,
it requires about 500,000 operations to test one user ses-
sion for masquerade attacks in the SEA data set, because
the length of the signature sequence is 5,000 while that
of the test sequence is 100. The resulting overhead is
unacceptable in multiuser environments like cloud
systems or in computationally limited devices like wire-
less systems. To reduce this overhead, we introduce two
computational enhancements that concern, respectively,
the Top-Matching Based Overlapping module and the
parallelized detection module that are executed in each
alignment session. In the following, we outline in details
these two modules.
4.3.1 The Top-Matching Based Overlapping Module
To align the session patterns to a set of overlapped subse-
quences of the user signatures, this module uses the
restricted permutation scoring system, Maximum Factor
of Test Gaps (mftg) in Eq. (7), and the scoring parameters
of each user. As explained in Section 4.1, the TMBO
improves the Heuritic Aligning of the Enhanced-SGA.
After splitting the signature sequence into a set of over-
lapped blocks of length L, see Eq. (5), it chooses the sub-
sequence with the highest match to be used in the
alignment process. In the worst case, all overlapped
Fig. 9. ROC curve for our two scoring systems, SGA ones and other
detection approaches.
TABLE 5
A Comparison between Our Two Scoring Systems
and the Current Detection Approaches
Approach Name Hit Ratio
percent
False Positive
percent
Maxion-Town-
send Cost
DDSGA
(Restricted
Permutation)
83.3 3.4 37.1
DDSGA (Free
Permutation)
80.5 3.8 42.3
SGA (Signature
Updating) [3]
68.6 1.9 42.8
SGA (Signature
Updating þ
Heuristic
Aligning) [3]
66.5 1.8 44.3
Na€ıve Bayes
(With
Updating) [11]
61.5 1.3 46.3
SGA (Binary
Scoring) [3]
60.3 2.9 57.1
Adaptive Na€ıve
Bayes [21]
87.8 7.7 58.4
Recursive Data
Mining [18]
62.3 3.7 59.9
Na€ıve Bayes (No
Updating) [11]
66.2 4.6 61.4
WRBF-NB [20] 83.1 7.7 63.1
Episode based
Na€ıve
Bayes [19]
77.6 7.7 68.6
Uniqueness [8] 39.4 1.4 69.0
Hybrid
Markov [14]
49.3 3.2 69.9
SGA (Previous
Scoring) [2]
75.8 7.7 70.4
Bayes 1-Step
Markov [13]
69.3 6.7 70.9
IPAM [15] 41.1 2.7 75.1
SGA (Command
Grouping) [3]
42.2 3.5 78.8
Sequence
Matching [16]
36.8 3.7 85.4
Compression [6] 34.2 5.0 95.8
172 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 2, MARCH/APRIL 2015
subsequences should be aligned as they have the same
highest match value. We have verified that on average,
the number of alignments is rather smaller because of the
variation between the overlapped signature subsequen-
ces. The evaluation of the proposed TMBO approach
mainly depends upon two parameters: (a) Number of
average alignments for the detection process, (b) The
effect of the TMBO on false alarm rates and hit ratio.
To evaluate our approach with respect to (a) we show
through an example how it reduces the alignment com-
putations. As far as concerns (b), we use the ROC curve
and Maxion-Townsend. To evaluate TMBO, Fig. 10
shows the user session patterns to be aligned to the sig-
nature patterns. The configuration phase returns these
values: signature_gap_penalty ¼ 9, test_gap_penalty ¼ 5,
optimal_score_sys ¼ “Restricted Permutation”, detectio-
n_update_threshold ¼ 82.2%, and mftg (maximum factor
of test_gaps) ¼ 33%.
The first step of TMBO computes the following length of
the overlapped subsequences according to Eq. (5):
L ¼ n þ mftg à nd eð Þ ¼ 10 þ 33=100 à 10d eð Þ
¼ 10 þ 3:3d e ¼ 10 þ 4 ¼ 14:
With respect to the one in the initialization module, the
current overlapping runs with length L rather than 2n.
Fig. 10 shows the resulting overlapped signature subse-
quences of size L ¼ 14. The last subsequence, i.e. subse-
quence 15, may be shorter than L, but it is still longer than
the test sequence.
The second step computes the match corresponding to
each subsequence as shown in the front of each subse-
quence in Fig. 10. We only consider the matching of subse-
quence 1 because those for other subsequences are similar.
If we denote by Match(X, s), the minimum between the
occurrences of X in, respectively, a user session and in a
subsequence, then
MMMatchðsssubseq:1Þ
¼
X
ðX in A; B; C; D; E; FðMatchðX; subseq:1ÞÞ
¼
X
ðMinð3; 1Þ; Minð2; 2Þ; Minð1; 2Þ; Minð1; 2Þ;
Minð2; 2Þ; Minð1; 2ÞÞ
¼
X
ð1; 2; 1; 1; 2; 1Þ ¼ 8:
The third step chooses the top match subsequences, e.g.
subsequences 2 and 15 in the example, as the best signature
subsequences to be aligned against the test session patterns
of the user. To evaluate the reduction of the workload due
to TMBO, consider the Number of Asymptotic Computa-
tions (NAC) computed by Eq. (11). In the previous example,
TMBO reduces the number of alignments from 3000 to 280
with a saving of 90.66 percent.
NAC ¼ Avg n align à Sig len à Test len (11)
Where:
- Avg_n_align is the average number of alignments
required for one detection session over all existing users.
- Sig_len is the length of the overlapped signature
subsequence.
- Test_len is the length of the test sequence).
To determine if the session patterns of the current user
contain a masquerader, the final step compares the highest
scores of the previous two alignments against a detectio-
n_update_threshold. As explained in the update phase, if at
least one of the previous eight alignments has a score larger
than or equal to the detection_update_threshold, then an
inline update process should be executed for the signature
subsequence and the user lexicon.
The evaluation using the SEA data set shows that TMBO
reduces the maximum number of alignments from 49 to an
average of 5.13 alignments per detection, a substantial
improvement in detection scenarios. Table 6 shows the
asymptotic computations for three detection approaches.
The first is our TMBO without the inline update module.
The second one is the Heuristic Aligning with signature
update [3]. Finally, the third one is the traditional SGA algo-
rithm without the Heuristic Aligning or update feature. The
NAC per one detection session can be computed as in
Eq. (11). If we considered that each of the fifty users in SEA
data set has one active session in a multi users system, then
Total_NAC ¼ NAC Ã
50.
To evaluate false alarm rates and hit ratios, we have
tested TMBO using the ROC curve and Maxion-Townsend
score. Table 7 and Fig. 11 show that TMBO has a lower
impact on the overall accuracy than other approaches.
4.3.2 The Parallelized Detection Module
Since TMBO partitions the user signature into a set of
overlapped subsequences, we can parallelize the detection
algorithm because it can align the commands in the user
test session to each top match signature subsequence sepa-
rately. In the example of Section 4.2.1, we can run in parallel
the threads to align subsequences 2 and 15. If a thread
Fig. 10. Overlapped Signature Subsequences of Size 14.
KHOLIDY ET AL.: DDSGA: A DATA-DRIVEN SEMI-GLOBAL ALIGNMENT APPROACH FOR DETECTING MASQUERADE ATTACKS 173
returns a computed alignment score at least equal to detec-
tion_update_threshold, then it sends a “No Masquerader”
message and then runs an inline update of both its signature
subsequence and the lexicon of the current user. Instead, if
the alignment score is lower than the detection_update_th-
reshold, the thread raises a “Masquerader Detected” alert.
Fig. 12 shows the parallelized detection module processes.
We have run several experiments using the SEA data set
to evaluate how the parallel version of the module influen-
ces the overall performance of detection. The experiments
have used an Intel Core 2 Duo Processor E4500 (2M Cache,
2.20 GHz) with 4 GB of memory and running Windows 7
SP1, Machine “A”. Machine “B” is Intel Core i3-2330M (3M
Cache, 2.30 GHz) with 6 GB of memory and running Win-
dows 7 SP1.
The results show that there are two operational modes of
the module: Full Parallelization Mode (FPM) and Semi Paral-
lelization (SPM) one. The module selects the most appropri-
ate one according to the capabilities of the underlying
machine and to n_aligns, the number of alignments per
detection. It selects the FPM mode if the machine capabilities
match n_aligns so that the module achieves the best perfor-
mance. This is the most common mode in our experiments.
An example is the detection sessions of user 7 where
n_aligns ¼ 5, Fig. 13 shows that if five thread are used then
each thread runs a distinct alignment and this minimizes
the detection time.
The SPM mode is selected if the machine capabilities
do not match the required n_aligns. This results in a
small performance degradation due to inactive threads.
The SPM mode is rarely selected in our experiments
TABLE 6
TMBO Approach in Three Detection Approaches
Approach Name Avg-n-align NAC (1 user) NAC (50 users)
DDSGA with
L ¼ 145.73
5. 13 5. 13 Ã 145.73 Ã
100 ¼ 74759.49
74759.49 Ã
50 ¼ 3737974.5
SGA with Sig.
length ¼ 200
4.5 4.5 Ã 200 Ã
100 ¼ 90,000
90,000 Ã
50 ¼ 4,500,000
Traditional SGA
with Sig. length ¼ 200
49 49 Ã 200 Ã
100 ¼ 980,000
980,000 Ã
50 ¼ 49,000,000
TABLE 7
The Masquerade Detection Approaches against DDSGA
with Its Two Scoring Systems
Approach Name Hit Ratio percent False Positive
percent
Maxion-Townsend
Cost
DDSGA (Restricted
Permutation, With-
out Updating)
83.3 3.4 37.1
DDSGA (Restricted
Permutation, With-
out Updating) þ
TMBO
81.5 3.3 38.3
DDSGA (Free Per-
mutation)
80.5 3.8 42.3
SGA (Signature
updating)
68.6 1.9 42.8
SGA (Signature
updating) þ Heu-
ristic
66.5 1.8 44.3
Na€ıve Bayes (With
Updating)
61.5 1.3 46.3
SGA (Binary Scor-
ing, No Updating)
60.3 2.9 57.1
Fig. 12. The processes of the parallelized detection module.
Fig. 11. The impact of our TMBO approach on the system accuracy.
174 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 2, MARCH/APRIL 2015
because the fifty users of the SEA data set results in a
value of Avg_n_align, equals to 5.13. Hence, on average,
the parallelized detection module uses 6 threads to run a
detection session. A SPM mode example is the detection
sessions of user 23 where n_aligns ¼ 9. Fig. 14 shows
that the shortest detection time is reached when running
6 threads in machine “A” and 8 threads in machine “B”.
In this case, 3 threads are idle in machine “A” and 1 in
machine “B”.
4.4 The Update Phase
The update of the user signature patterns is critical because
any IDS should be automatically updated to the new legal
behaviours of a user. This update is implemented by two
modules: the inline update module and the long term
update one.
4.4.1 The Inline Update Module
This module has two main tasks:
1) Finding areas in user signature subsequences to be
updated and augmented with the new user behavior
patterns.
2) Update the user lexicon by inserting new commands.
In the detection phase, after each alignment, each parallel
thread may update both the user signature subsequences
and the user lexicon. Three cases are possible in the TBA,
see Fig. 15:
1) The test sequence pattern matches the corresponding
signature subsequence pattern,
2) A gap is inserted into either or both sequences
3) There is at least a mismatch between the patterns in
the two sequences.
In case (a), no update is required because the alignment
has properly used the symbol in the proper subsequence to
find the optimal alignment.
Also case (b) does not require an update because symbols
that are aligned with gaps are not similar and should be
neglected.
In case (c), we consider all the mismatches within the cur-
rent test sequence. Then, both the signature subsequence
and the user lexicon are updated under two conditions. The
first one states that we can insert into the user signatures
only those patterns that are free of masquerading records.
This happens anytime the overall_alignment_score for the
current test sequence is larger than or equal to the detectio-
n_update_threshold. The second condition states that the
current test pattern should have previously appeared in
the user lexicon or belongs to the same functional group of
the corresponding signature pattern. The two conditions
imply that the inline module updates the user lexicon with
the new pattern if it does not belong to the lexicon. It also
extends the pattern with the current signature subsequence
and adds the resulting subsequence to the user signatures
without changing the original one. In other words, if a pat-
tern in the user lexicon or in the same functional group of its
corresponding signature pattern has participated in a con-
served alignment, then a new permutation of the behavior
of the user has been created. For instance, if the alignment
score of the test sequence in Fig. 16 is larger than or equal to
the detection_update_threshold, then the pattern ‘E’ at the
end of this test sequence has a mismatch with ‘A’ at the end
of the signature subsequence. If ‘E’ exists in the user lexicon
Fig. 13. FPM and SPM modes for user 7 in Machines “A” and “B”. Fig. 14. FPM and SPM modes for user 23 in Machines “A” and “B”.
Fig. 15. The inline update steps.
KHOLIDY ET AL.: DDSGA: A DATA-DRIVEN SEMI-GLOBAL ALIGNMENT APPROACH FOR DETECTING MASQUERADE ATTACKS 175
or belongs to the same functional group of ‘A’, the signature
subsequence is augmented so that the position with ‘A’
matches with both ‘A’ or ‘E’. If ‘E’ does not belong to the lex-
icon, it is also inserted into it. This simply embeds observed
variations in the signature sequence without destroying any
information it encodes. In this case, a new augmented sub-
sequence (HEE) is inserted into the user signature subse-
quences. If only the first condition is satisfied, only the user
lexicon is updated so that the following checks use the new
pattern. In fact, it is highly likely that a pattern that has
appeared within a conserved, high scoring alignment has
been created by the legitimate user.
Besides improving the computational performance of
system update, the module also improves the signature
update scheme of the Enhanced-SGA [3] as following:
1) It uses the parameters, the threshold, and the scoring
system returned by the configuration phase.
2) It runs in parallel with the detection phase and starts
as soon as the alignment score is computed. Instead,
the signature update scheme runs independently
after each detection process and it repeats the back-
ward tracing step.
3) It improves flexibility in the signature update by
considering any occurrence of commands permuta-
tions or functionality matching.
To evaluate how the inline update module reduces the
false alarm rates and improves the hit ratio, we have used
the ROC curve and the Maxion-Townsend score after apply-
ing the inline update module. Fig. 16 and Table 8 show that
the inline update module reduces the false alarm rates and
increases the hit ratio. Therefore, it significantly improves
the accuracy with respect to other approaches.
4.4.2 The Long Term Update Module
This module reconfigures the system parameters through
the outputs of the inline update module. There are three
strategies to run the module: periodic, idle time, and thresh-
old. The proper one is selected according to the characteris-
tic and requirements of the monitored system.
The periodic strategy runs the reconfiguration step with a
fixed frequency, i.e. 3 days or 1 week. To reduce the over-
head, the idle time strategy runs the reconfiguration step
anytime the system is idle. This solution is appropriate in
highly overloaded systems that require an efficient use of the
network and computational resources. The threshold strat-
egy runs the reconfiguration step as soon as the number of
test patterns inserted into the signature sequences reaches a
threshold that is distinct for each user and frequently
updated. This approach is highly efficient because it runs the
module only if the signature sequence is changed.
5 CONCLUSION AND FUTURE WORK
Masquerading is by far one of the most critical attacks
because an attacker that can successfully logs to a system
can also maliciously control it. The semi-global alignments
(SGA) is based upon sequence alignment and it is one of
the most effective detection techniques that can be applied
to distinct sequences of audit data. While SGA may result
in low false positive and missing alarms rates, even its
enhanced version has not yet achieved the level of accu-
racy and performance for practical deployment. This is
the reason underlying the design of the Data-Driven
Semi-Global Alignment Approach, DDSGA. From the
security efficiency perspective, DDSGA models more accu-
rately the consistency of the behaviour of distinct users by
introducing distinct parameters. Furthemore, it offers two
scoring systems that tolerate changes in the low-level
representation of the commands functionality by
Fig. 16. The impact of the inline update on the system accuracy.
TABLE 8
Masquerade Detection Approaches against DDSGA with Its Inline Update Module
Approach Name Hit
Ratio percent
False
Positive percent
Maxion-Townsend Cost
DDSGA (Inline Update þ TMBO þ Restricted Permutation) 88.4 1.7 21.8
DDSGA (Restricted Permutation, Without Updating) 83.3 3.4 37.1
DDSGA (Restricted Permutation, Without Updating) þ TMBO 81.5 3.3 38.3
DDSGA (Free Permutation) 80.5 3.8 42.3
SGA (Signature updating) 68.6 1.9 42.8
SGA (Signature updating) þ Heuristic 66.5 1.8 44.3
Na€ıve Bayes (With Updating) 61.5 1.3 46.3
SGA (Binary Scoring, No Updating) 60.3 2.9 57.1
176 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 2, MARCH/APRIL 2015
categorizing user commands and aligning commands in
the same class without reducing the alignment score. The
scoring systems also tolerates both permutation of its
commands and changes in the user behaviour over time.
All these features strongly reduce false positive and miss-
ing alarm rates and improve the detection hit ratio. In the
experiments using the SEA data set, the performance of
DDSGA is always better than the one of SGA. From the
computational perspective, the Top-Matching Based Over-
lapping approach reduces the computational load of
alignment by decomposing the signature sequence into a
smaller set of overlapped subsequences. Furthermore, the
detection and the update processes can be parallelized
with no loss of accuracy.
For future work, we plan to apply our approach to detect
masquerade attacks in cloud environment by improing our
CIDS framework [4]. As a first step, we have developed a
new data set, CIDD [10] that includes distinct audit data
from distinct host operating systems and physical network
environment. This will supports an evaluation of DDSGA
that can use different kinds of audit sequences.
ACKNOWLEDGMENTS
This work was supported by QNRF (a member of Qatar
Foundation), Fayoum University, Pisa University, AFOSR
DDDAS award # FA9550-12-1-0241, and the US National
Science Foundation (NSF) awards IIP-0758579, DUE-
1303362, and SES-1314631.
REFERENCES
[1] A. H. Phyo and S. M. Furnell. “A detection-oriented classification
of insider it misuses,” in Proc. 3rd Security Conf. 2004.
[2] S. E. Coull, J. W. Branch, B. K. Szymanski, and E. A. Breimer,
“Intrusion detection: A bioinformatics approach,” in Proc. 19th
Annu. Comput. Security Appl. Conf., Las Vegas, NV, USA, Dec.
2003, pp. 24–33.
[3] S. E. Coulla and B. K. Szymanski, “Sequence alignment for
masquerade detection,” J. Comput. Statist. Data Anal., vol. 52,
no. 8, pp. 4116–4131, Apr. 2008.
[4] Hisham A. Kholidy and Fabrizio Baiardi, “CIDS: A framework
for intrusion detection in cloud systems,” in Proc. 9th Int.
Conf. Inf. Technol.: New Generations, Las Vegas, Nevada, USA,
Apr. 2012, pp. 16–18.
[5] (2001) [Online]. Available: http://www.schonlau.net/intrusion.
html
[6] M. Schonlau, W. DuMouchel, W. Ju, A. F. Karr, M. Theus, and Y.
Vardi, “Computer intrusion: Detecting masquerades,” Statist. Sci.
vol. 16, no. 1, pp. 58–74, 2001.
[7] “Greenberg: Using unix: Collected traces of 168 users,” Dept.
Comput. Sci., Univ. Calgary, Calgary, Canada, Res. Rep. 88/333/
45, 1988.
[8] T. Lane and C. E. Brodley, “An application of machine learning to
anomaly detection,” in Proc. 20th Nat. Inf. Syst. Security Conf.,
1997, pp. 366–380.
[9] RUU data set: (2008) [Online] Available: http://sneakers.cs.
columbia.edu/ids/RUU/data/.
[10] Hisham. A. Kholidy and Fabrizio Baiardi, “CIDD: A cloud intru-
sion detection data set for cloud computing and masquerade
attacks,” in Proc. 9th Int. Conf. Inf. Technol.: New Generations, Las
Vegas, NV, USA, Apr. 2012, pp. 16–18.
[11] R. A. Maxion and T. N. Townsend, “Masquerade detection using
truncated command lines,” in Proc. Int. Conf. Dependable Syst.
Netw., Washington, DC, USA, Jun. 2002, pp. 219–228.
[12] R. Posadas, J. C. Mex-Perera, R. Monroy, and J. A. Nolazco-Flores,
“Hybrid method for detecting masqueraders using session fold-
ing and hidden markov models,” in Proc. 5th Mexican Int. Conf.
Artif. Intell., 2006, pp. 622–631.
[13] W. Dumouchel. (1999). “Computer intrusion detection based on
Bayes Factors for comparing command transition probabilities”.
Technical report 91, National Institute of Statistical Sciences,
[Online]. Available: www.niss.org/downloadabletechreports.
html
[14] W. Ju and Y. Vardi. (1999). “A hybrid high-order Markov chain
model for computer intrusion detection”. Nat. Inst. Statist. Sci.
Research Triangle Park, NC, USA, Tech. Rep. 92. [Online]. Avail-
able: www.niss.org/downloadabletechreports.html
[15] Brian D. Davison and Haym Hirsh, “Predicting sequences of user
actions,” in Proc. Joint Workshop Predicting Future: AI Approaches
Time Ser. Anal., 1998, pp. 5–12.
[16] T. Lane and C. E. Brodley, “Approaches to online learning and
concept drift for user identification in computer security,” in Proc
4th Int. Conf. Knowl. Discovery Data Mining, New York, NY, USA,
Aug. 1998, pp. 259–263.
[17] B. Christopher, “A tutorial on support vector machines for pattern
recognition,” Data Mining Knowl. Discovery, vol. 2, no. 2, pp. 121–
167, 1998.
[18] B. Szymanski and Y. Zhang, “Recursive data mining for masquer-
ade detection and author identification,” in Proc. IEEE 5th Syst.,
Man .Cybern. Inf. Assurance Workshop, West Point, NY, USA, Jun.
2004, pp. 424–431.
[19] S. K. Dash, K. S. Reddy, and A. K. Pujari, “Episode based mas-
querade detection,” in Proc. 1st Int. Conf. Inf. Syst. Security, 2005,
pp. 251–262.
[20] A. Sharma and K. K. Paliwal, “Detecting masquerades using a
combination of Na€ıve Bayes and weighted RBF approach,” J. Com-
put. Virology, vol. 3, no. 3, pp, 237–245, 2007.
[21] Subrat Kumar Dash, K. S. Reddy, and K. A. Pujari, “Adaptive
Naive Bayes method for masquerade detection”, Security Com-
mun. Netw., vol. 4, no. 4, pp. 410–417, 2011.
[22] S. Malek and S. Salvatore, “Detecting masqueraders: A compari-
son of one-class bag-of-words user behavior modeling
techniques,” in Proc. 2nd Int. Workshop Managing Insider Security
Threats, Morioka, Iwate, Japan. Jun. 2010, pp. 3–13.
[23] A. S. Sodiya, O. Folorunso, S. A. Onashoga, and P. O. Ogundeyi,
“An improved semi-global alignment algorithm for masquerade
detection,” Int. J. Netw. Security, vo1. 12, no. 3, pp. 211–220, May
2011.
[24] T. F. Smith and M. S. Waterman, “Identification of common molec-
ular subsequences,” J. Molecular Biol., vol. 147, pp. 195–197, 1981.
[25] D. B. Christopher and T. D. Herbert, “Receiver operating charac-
teristic curves and related decision measures: A tutorial,” Chemo-
metrics Intell. Lab. Syst., vol. 80, pp. 24–38, 2006.
[26] K. Wang and S. J. Stolfo, “One class training for masquerade
detection,” in Proc. IEEE 3rd Conf. Workshop Data Mining Comput.
Secur., Florida, Nov. 2003.
Hesham AbdElazim Ismail Mohamed Kholidy
received the PhD degree in computer science
and engineering in a joint PhD program between
the University of Pisa in Italy and the University of
Arizona. He was an associate researcher in the
US National Science Foundation (NSF) Cloud
and Autonomic Computing Center, Electrical and
Computer Engineering Department, University of
Arizona, and a research fellow at Galileo Galilei
Doctorate School at the University of Pisa. He
recently published a new patent in Data Encryp-
tion in the United State Patent and Trademark Office (USPTO). He pub-
lished more than 25 papers in top journals and conferences, and he
supervises a group of master’s and PhD students. His research interests
include information security, distributed and high-performance systems
including Grid and Cloud computing systems, critical and SCADA sys-
tems, and bioinformatics.
KHOLIDY ET AL.: DDSGA: A DATA-DRIVEN SEMI-GLOBAL ALIGNMENT APPROACH FOR DETECTING MASQUERADE ATTACKS 177
Fabrizio Baiardi received the degree from Uni-
versita di Pisa, where he is currently a full profes-
sor of computer science. He has chaired the
master’s degree on computer security, and
teaches some courses on the security of cloud
systems. His main research interests include for-
mal tools for the risk assessment and manage-
ment of cloud systems and critical infrastructures.
He is currently involved in several projects on
intrusion detection of cloud and SCADA systems.
Salim Hariri received the MSc degree from Ohio
State University in 1982 and the PhD degree in
computer engineering from the University of
Southern California in 1986. He is a professor
in the Electrical Communication Engineering
Department at the University of Arizona and the
director of the US National Science Foundation
(NSF) Center: Cloud and Autonomic Computing
Center. His current research focuses on high-per-
formance distributed computing, design and anal-
ysis of high-speed networks, and developing
software design tools for high-performance systems. He has coauthored
more than 200 journal and conference research papers, and is a coau-
thor/editor of three books on parallel and distributed computing. He is a
member of IEEE Computer Society.
 For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/publications/dlib.
178 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 2, MARCH/APRIL 2015

More Related Content

What's hot

Honeywords for Password Security and Management
Honeywords for Password Security and ManagementHoneywords for Password Security and Management
Honeywords for Password Security and ManagementIRJET Journal
 
IRJET - Efficient Public Key Cryptosystem for Scalable Data Sharing in Cloud ...
IRJET - Efficient Public Key Cryptosystem for Scalable Data Sharing in Cloud ...IRJET - Efficient Public Key Cryptosystem for Scalable Data Sharing in Cloud ...
IRJET - Efficient Public Key Cryptosystem for Scalable Data Sharing in Cloud ...IRJET Journal
 
THE METHOD OF DETECTING ONLINE PASSWORD ATTACKS BASED ON HIGH-LEVEL PROTOCOL ...
THE METHOD OF DETECTING ONLINE PASSWORD ATTACKS BASED ON HIGH-LEVEL PROTOCOL ...THE METHOD OF DETECTING ONLINE PASSWORD ATTACKS BASED ON HIGH-LEVEL PROTOCOL ...
THE METHOD OF DETECTING ONLINE PASSWORD ATTACKS BASED ON HIGH-LEVEL PROTOCOL ...IJCNCJournal
 
SURVEY ON LINK LAYER ATTACKS IN COGNITIVE RADIO NETWORKS
SURVEY ON LINK LAYER ATTACKS IN COGNITIVE RADIO NETWORKSSURVEY ON LINK LAYER ATTACKS IN COGNITIVE RADIO NETWORKS
SURVEY ON LINK LAYER ATTACKS IN COGNITIVE RADIO NETWORKSijcseit
 
A SECURE SCHEMA FOR RECOMMENDATION SYSTEMS
A SECURE SCHEMA FOR RECOMMENDATION SYSTEMSA SECURE SCHEMA FOR RECOMMENDATION SYSTEMS
A SECURE SCHEMA FOR RECOMMENDATION SYSTEMSIJCI JOURNAL
 
IRJET- A Review of the Concept of Smart Grid
IRJET- A Review of the Concept of Smart GridIRJET- A Review of the Concept of Smart Grid
IRJET- A Review of the Concept of Smart GridIRJET Journal
 
DETECTION OF APPLICATION LAYER DDOS ATTACKS USING INFORMATION THEORY BASED ME...
DETECTION OF APPLICATION LAYER DDOS ATTACKS USING INFORMATION THEORY BASED ME...DETECTION OF APPLICATION LAYER DDOS ATTACKS USING INFORMATION THEORY BASED ME...
DETECTION OF APPLICATION LAYER DDOS ATTACKS USING INFORMATION THEORY BASED ME...cscpconf
 
Evaluation of network intrusion detection using markov chain
Evaluation of network intrusion detection using markov chainEvaluation of network intrusion detection using markov chain
Evaluation of network intrusion detection using markov chainIJCI JOURNAL
 
Iaetsd a survey on detecting denial-of-service attacks
Iaetsd a survey on detecting denial-of-service attacksIaetsd a survey on detecting denial-of-service attacks
Iaetsd a survey on detecting denial-of-service attacksIaetsd Iaetsd
 
Secure intrusion detection and countermeasure selection in virtual system usi...
Secure intrusion detection and countermeasure selection in virtual system usi...Secure intrusion detection and countermeasure selection in virtual system usi...
Secure intrusion detection and countermeasure selection in virtual system usi...eSAT Publishing House
 
Ensure Security and Scalable Performance in Multiple Relay Networks
Ensure Security and Scalable Performance in Multiple Relay NetworksEnsure Security and Scalable Performance in Multiple Relay Networks
Ensure Security and Scalable Performance in Multiple Relay NetworksEditor IJCATR
 
Privacy Preserving Reputation Calculation in P2P Systems with Homomorphic Enc...
Privacy Preserving Reputation Calculation in P2P Systems with Homomorphic Enc...Privacy Preserving Reputation Calculation in P2P Systems with Homomorphic Enc...
Privacy Preserving Reputation Calculation in P2P Systems with Homomorphic Enc...IJCNCJournal
 
Securing information in wireless sensor networks
Securing information in wireless sensor networksSecuring information in wireless sensor networks
Securing information in wireless sensor networkseSAT Publishing House
 

What's hot (18)

Honeywords for Password Security and Management
Honeywords for Password Security and ManagementHoneywords for Password Security and Management
Honeywords for Password Security and Management
 
IRJET - Efficient Public Key Cryptosystem for Scalable Data Sharing in Cloud ...
IRJET - Efficient Public Key Cryptosystem for Scalable Data Sharing in Cloud ...IRJET - Efficient Public Key Cryptosystem for Scalable Data Sharing in Cloud ...
IRJET - Efficient Public Key Cryptosystem for Scalable Data Sharing in Cloud ...
 
Ijmet 10 02_045
Ijmet 10 02_045Ijmet 10 02_045
Ijmet 10 02_045
 
THE METHOD OF DETECTING ONLINE PASSWORD ATTACKS BASED ON HIGH-LEVEL PROTOCOL ...
THE METHOD OF DETECTING ONLINE PASSWORD ATTACKS BASED ON HIGH-LEVEL PROTOCOL ...THE METHOD OF DETECTING ONLINE PASSWORD ATTACKS BASED ON HIGH-LEVEL PROTOCOL ...
THE METHOD OF DETECTING ONLINE PASSWORD ATTACKS BASED ON HIGH-LEVEL PROTOCOL ...
 
SURVEY ON LINK LAYER ATTACKS IN COGNITIVE RADIO NETWORKS
SURVEY ON LINK LAYER ATTACKS IN COGNITIVE RADIO NETWORKSSURVEY ON LINK LAYER ATTACKS IN COGNITIVE RADIO NETWORKS
SURVEY ON LINK LAYER ATTACKS IN COGNITIVE RADIO NETWORKS
 
A SECURE SCHEMA FOR RECOMMENDATION SYSTEMS
A SECURE SCHEMA FOR RECOMMENDATION SYSTEMSA SECURE SCHEMA FOR RECOMMENDATION SYSTEMS
A SECURE SCHEMA FOR RECOMMENDATION SYSTEMS
 
IRJET- A Review of the Concept of Smart Grid
IRJET- A Review of the Concept of Smart GridIRJET- A Review of the Concept of Smart Grid
IRJET- A Review of the Concept of Smart Grid
 
DETECTION OF APPLICATION LAYER DDOS ATTACKS USING INFORMATION THEORY BASED ME...
DETECTION OF APPLICATION LAYER DDOS ATTACKS USING INFORMATION THEORY BASED ME...DETECTION OF APPLICATION LAYER DDOS ATTACKS USING INFORMATION THEORY BASED ME...
DETECTION OF APPLICATION LAYER DDOS ATTACKS USING INFORMATION THEORY BASED ME...
 
H364752
H364752H364752
H364752
 
Evaluation of network intrusion detection using markov chain
Evaluation of network intrusion detection using markov chainEvaluation of network intrusion detection using markov chain
Evaluation of network intrusion detection using markov chain
 
Iaetsd a survey on detecting denial-of-service attacks
Iaetsd a survey on detecting denial-of-service attacksIaetsd a survey on detecting denial-of-service attacks
Iaetsd a survey on detecting denial-of-service attacks
 
Secure intrusion detection and countermeasure selection in virtual system usi...
Secure intrusion detection and countermeasure selection in virtual system usi...Secure intrusion detection and countermeasure selection in virtual system usi...
Secure intrusion detection and countermeasure selection in virtual system usi...
 
J1802035460
J1802035460J1802035460
J1802035460
 
Ensure Security and Scalable Performance in Multiple Relay Networks
Ensure Security and Scalable Performance in Multiple Relay NetworksEnsure Security and Scalable Performance in Multiple Relay Networks
Ensure Security and Scalable Performance in Multiple Relay Networks
 
Privacy Preserving Reputation Calculation in P2P Systems with Homomorphic Enc...
Privacy Preserving Reputation Calculation in P2P Systems with Homomorphic Enc...Privacy Preserving Reputation Calculation in P2P Systems with Homomorphic Enc...
Privacy Preserving Reputation Calculation in P2P Systems with Homomorphic Enc...
 
Securing information in wireless sensor networks
Securing information in wireless sensor networksSecuring information in wireless sensor networks
Securing information in wireless sensor networks
 
Kg2417521755
Kg2417521755Kg2417521755
Kg2417521755
 
1762 1765
1762 17651762 1765
1762 1765
 

Viewers also liked

Cost-Effective Authentic and Anonymous Data Sharing with Forward Security
Cost-Effective Authentic and Anonymous Data Sharing with Forward SecurityCost-Effective Authentic and Anonymous Data Sharing with Forward Security
Cost-Effective Authentic and Anonymous Data Sharing with Forward Security1crore projects
 
Contributory Broadcast Encryption with Efficient Encryption and Short Ciphert...
Contributory Broadcast Encryption with Efficient Encryption and Short Ciphert...Contributory Broadcast Encryption with Efficient Encryption and Short Ciphert...
Contributory Broadcast Encryption with Efficient Encryption and Short Ciphert...1crore projects
 
Secure Distributed Deduplication Systems with Improved Reliability
Secure Distributed Deduplication Systems with Improved ReliabilitySecure Distributed Deduplication Systems with Improved Reliability
Secure Distributed Deduplication Systems with Improved Reliability1crore projects
 
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...1crore projects
 
Continuous and Transparent User Identity Verification for Secure Internet Ser...
Continuous and Transparent User Identity Verification for Secure Internet Ser...Continuous and Transparent User Identity Verification for Secure Internet Ser...
Continuous and Transparent User Identity Verification for Secure Internet Ser...1crore projects
 
Dynamic Routing for Data Integrity and Delay Differentiated Services in Wirel...
Dynamic Routing for Data Integrity and Delay Differentiated Services in Wirel...Dynamic Routing for Data Integrity and Delay Differentiated Services in Wirel...
Dynamic Routing for Data Integrity and Delay Differentiated Services in Wirel...1crore projects
 
Software Puzzle: A Countermeasure to Resource-Inflated Denial- of-Service Att...
Software Puzzle: A Countermeasure to Resource-Inflated Denial- of-Service Att...Software Puzzle: A Countermeasure to Resource-Inflated Denial- of-Service Att...
Software Puzzle: A Countermeasure to Resource-Inflated Denial- of-Service Att...1crore projects
 
A Lightweight Secure Scheme for Detecting Provenance Forgery and Packet Drop ...
A Lightweight Secure Scheme for Detecting Provenance Forgery and Packet Drop ...A Lightweight Secure Scheme for Detecting Provenance Forgery and Packet Drop ...
A Lightweight Secure Scheme for Detecting Provenance Forgery and Packet Drop ...1crore projects
 
Effective Key Management in Dynamic Wireless Sensor Networks
Effective Key Management in Dynamic Wireless Sensor NetworksEffective Key Management in Dynamic Wireless Sensor Networks
Effective Key Management in Dynamic Wireless Sensor Networks1crore projects
 
Improved Privacy-Preserving P2P Multimedia Distribution Based on Recombined F...
Improved Privacy-Preserving P2P Multimedia Distribution Based on Recombined F...Improved Privacy-Preserving P2P Multimedia Distribution Based on Recombined F...
Improved Privacy-Preserving P2P Multimedia Distribution Based on Recombined F...1crore projects
 
Designing High Performance Web-Based Computing Services to Promote Telemedici...
Designing High Performance Web-Based Computing Services to Promote Telemedici...Designing High Performance Web-Based Computing Services to Promote Telemedici...
Designing High Performance Web-Based Computing Services to Promote Telemedici...1crore projects
 
A Computational Dynamic Trust Model for User Authorization
A Computational Dynamic Trust Model for User AuthorizationA Computational Dynamic Trust Model for User Authorization
A Computational Dynamic Trust Model for User Authorization1crore projects
 
Maximizing P2P File Access Availability in Mobile Ad Hoc Networks though Repl...
Maximizing P2P File Access Availability in Mobile Ad Hoc Networks though Repl...Maximizing P2P File Access Availability in Mobile Ad Hoc Networks though Repl...
Maximizing P2P File Access Availability in Mobile Ad Hoc Networks though Repl...1crore projects
 
Key Updating for Leakage Resiliency with Application to AES Modes of Operation
Key Updating for Leakage Resiliency with Application to AES Modes of OperationKey Updating for Leakage Resiliency with Application to AES Modes of Operation
Key Updating for Leakage Resiliency with Application to AES Modes of Operation1crore projects
 
A Proximity-Aware Interest-Clustered P2P File Sharing System
A Proximity-Aware Interest-Clustered P2P File Sharing System A Proximity-Aware Interest-Clustered P2P File Sharing System
A Proximity-Aware Interest-Clustered P2P File Sharing System 1crore projects
 
Location-Aware and Personalized Collaborative Filtering for Web Service Recom...
Location-Aware and Personalized Collaborative Filtering for Web Service Recom...Location-Aware and Personalized Collaborative Filtering for Web Service Recom...
Location-Aware and Personalized Collaborative Filtering for Web Service Recom...1crore projects
 
Privacy-Preserving and Truthful Detection of Packet Dropping Attacks in Wirel...
Privacy-Preserving and Truthful Detection of Packet Dropping Attacks in Wirel...Privacy-Preserving and Truthful Detection of Packet Dropping Attacks in Wirel...
Privacy-Preserving and Truthful Detection of Packet Dropping Attacks in Wirel...1crore projects
 
Friendbook: A Semantic-Based Friend Recommendation System for Social Networks
Friendbook: A Semantic-Based Friend Recommendation System for Social NetworksFriendbook: A Semantic-Based Friend Recommendation System for Social Networks
Friendbook: A Semantic-Based Friend Recommendation System for Social Networks1crore projects
 
Secure Data Aggregation Technique for Wireless Sensor Networks in the Presenc...
Secure Data Aggregation Technique for Wireless Sensor Networks in the Presenc...Secure Data Aggregation Technique for Wireless Sensor Networks in the Presenc...
Secure Data Aggregation Technique for Wireless Sensor Networks in the Presenc...1crore projects
 
User-Defined Privacy Grid System for Continuous Location-Based Services
User-Defined Privacy Grid System for Continuous Location-Based ServicesUser-Defined Privacy Grid System for Continuous Location-Based Services
User-Defined Privacy Grid System for Continuous Location-Based Services1crore projects
 

Viewers also liked (20)

Cost-Effective Authentic and Anonymous Data Sharing with Forward Security
Cost-Effective Authentic and Anonymous Data Sharing with Forward SecurityCost-Effective Authentic and Anonymous Data Sharing with Forward Security
Cost-Effective Authentic and Anonymous Data Sharing with Forward Security
 
Contributory Broadcast Encryption with Efficient Encryption and Short Ciphert...
Contributory Broadcast Encryption with Efficient Encryption and Short Ciphert...Contributory Broadcast Encryption with Efficient Encryption and Short Ciphert...
Contributory Broadcast Encryption with Efficient Encryption and Short Ciphert...
 
Secure Distributed Deduplication Systems with Improved Reliability
Secure Distributed Deduplication Systems with Improved ReliabilitySecure Distributed Deduplication Systems with Improved Reliability
Secure Distributed Deduplication Systems with Improved Reliability
 
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
Asymmetric Social Proximity Based Private Matching Protocols for Online Socia...
 
Continuous and Transparent User Identity Verification for Secure Internet Ser...
Continuous and Transparent User Identity Verification for Secure Internet Ser...Continuous and Transparent User Identity Verification for Secure Internet Ser...
Continuous and Transparent User Identity Verification for Secure Internet Ser...
 
Dynamic Routing for Data Integrity and Delay Differentiated Services in Wirel...
Dynamic Routing for Data Integrity and Delay Differentiated Services in Wirel...Dynamic Routing for Data Integrity and Delay Differentiated Services in Wirel...
Dynamic Routing for Data Integrity and Delay Differentiated Services in Wirel...
 
Software Puzzle: A Countermeasure to Resource-Inflated Denial- of-Service Att...
Software Puzzle: A Countermeasure to Resource-Inflated Denial- of-Service Att...Software Puzzle: A Countermeasure to Resource-Inflated Denial- of-Service Att...
Software Puzzle: A Countermeasure to Resource-Inflated Denial- of-Service Att...
 
A Lightweight Secure Scheme for Detecting Provenance Forgery and Packet Drop ...
A Lightweight Secure Scheme for Detecting Provenance Forgery and Packet Drop ...A Lightweight Secure Scheme for Detecting Provenance Forgery and Packet Drop ...
A Lightweight Secure Scheme for Detecting Provenance Forgery and Packet Drop ...
 
Effective Key Management in Dynamic Wireless Sensor Networks
Effective Key Management in Dynamic Wireless Sensor NetworksEffective Key Management in Dynamic Wireless Sensor Networks
Effective Key Management in Dynamic Wireless Sensor Networks
 
Improved Privacy-Preserving P2P Multimedia Distribution Based on Recombined F...
Improved Privacy-Preserving P2P Multimedia Distribution Based on Recombined F...Improved Privacy-Preserving P2P Multimedia Distribution Based on Recombined F...
Improved Privacy-Preserving P2P Multimedia Distribution Based on Recombined F...
 
Designing High Performance Web-Based Computing Services to Promote Telemedici...
Designing High Performance Web-Based Computing Services to Promote Telemedici...Designing High Performance Web-Based Computing Services to Promote Telemedici...
Designing High Performance Web-Based Computing Services to Promote Telemedici...
 
A Computational Dynamic Trust Model for User Authorization
A Computational Dynamic Trust Model for User AuthorizationA Computational Dynamic Trust Model for User Authorization
A Computational Dynamic Trust Model for User Authorization
 
Maximizing P2P File Access Availability in Mobile Ad Hoc Networks though Repl...
Maximizing P2P File Access Availability in Mobile Ad Hoc Networks though Repl...Maximizing P2P File Access Availability in Mobile Ad Hoc Networks though Repl...
Maximizing P2P File Access Availability in Mobile Ad Hoc Networks though Repl...
 
Key Updating for Leakage Resiliency with Application to AES Modes of Operation
Key Updating for Leakage Resiliency with Application to AES Modes of OperationKey Updating for Leakage Resiliency with Application to AES Modes of Operation
Key Updating for Leakage Resiliency with Application to AES Modes of Operation
 
A Proximity-Aware Interest-Clustered P2P File Sharing System
A Proximity-Aware Interest-Clustered P2P File Sharing System A Proximity-Aware Interest-Clustered P2P File Sharing System
A Proximity-Aware Interest-Clustered P2P File Sharing System
 
Location-Aware and Personalized Collaborative Filtering for Web Service Recom...
Location-Aware and Personalized Collaborative Filtering for Web Service Recom...Location-Aware and Personalized Collaborative Filtering for Web Service Recom...
Location-Aware and Personalized Collaborative Filtering for Web Service Recom...
 
Privacy-Preserving and Truthful Detection of Packet Dropping Attacks in Wirel...
Privacy-Preserving and Truthful Detection of Packet Dropping Attacks in Wirel...Privacy-Preserving and Truthful Detection of Packet Dropping Attacks in Wirel...
Privacy-Preserving and Truthful Detection of Packet Dropping Attacks in Wirel...
 
Friendbook: A Semantic-Based Friend Recommendation System for Social Networks
Friendbook: A Semantic-Based Friend Recommendation System for Social NetworksFriendbook: A Semantic-Based Friend Recommendation System for Social Networks
Friendbook: A Semantic-Based Friend Recommendation System for Social Networks
 
Secure Data Aggregation Technique for Wireless Sensor Networks in the Presenc...
Secure Data Aggregation Technique for Wireless Sensor Networks in the Presenc...Secure Data Aggregation Technique for Wireless Sensor Networks in the Presenc...
Secure Data Aggregation Technique for Wireless Sensor Networks in the Presenc...
 
User-Defined Privacy Grid System for Continuous Location-Based Services
User-Defined Privacy Grid System for Continuous Location-Based ServicesUser-Defined Privacy Grid System for Continuous Location-Based Services
User-Defined Privacy Grid System for Continuous Location-Based Services
 

Similar to DDSGA: A Data-Driven Semi-Global Alignment Approach for Detecting Masquerade Attacks

Evasion Streamline Intruders Using Graph Based Attacker model Analysis and Co...
Evasion Streamline Intruders Using Graph Based Attacker model Analysis and Co...Evasion Streamline Intruders Using Graph Based Attacker model Analysis and Co...
Evasion Streamline Intruders Using Graph Based Attacker model Analysis and Co...Editor IJCATR
 
Intrusion detection system based on web usage mining
Intrusion detection system based on web usage miningIntrusion detection system based on web usage mining
Intrusion detection system based on web usage miningIJCSEA Journal
 
Parallel and distributed system projects for java and dot net
Parallel and distributed system projects for java and dot netParallel and distributed system projects for java and dot net
Parallel and distributed system projects for java and dot netredpel dot com
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
An intelligent system to detect slow denial of service attacks in software-de...
An intelligent system to detect slow denial of service attacks in software-de...An intelligent system to detect slow denial of service attacks in software-de...
An intelligent system to detect slow denial of service attacks in software-de...IJECEIAES
 
2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...
2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...
2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...IEEEGLOBALSOFTSTUDENTSPROJECTS
 
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...IEEEMEMTECHSTUDENTPROJECTS
 
IRJET- Machine Learning based Network Security
IRJET-  	  Machine Learning based Network SecurityIRJET-  	  Machine Learning based Network Security
IRJET- Machine Learning based Network SecurityIRJET Journal
 
A new proactive feature selection model based on the enhanced optimization a...
A new proactive feature selection model based on the enhanced  optimization a...A new proactive feature selection model based on the enhanced  optimization a...
A new proactive feature selection model based on the enhanced optimization a...IJECEIAES
 
A system for denial of-service attack detection based on multivariate correla...
A system for denial of-service attack detection based on multivariate correla...A system for denial of-service attack detection based on multivariate correla...
A system for denial of-service attack detection based on multivariate correla...IGEEKS TECHNOLOGIES
 
DDOS ATTACKS DETECTION USING DYNAMIC ENTROPY INSOFTWARE-DEFINED NETWORK PRACT...
DDOS ATTACKS DETECTION USING DYNAMIC ENTROPY INSOFTWARE-DEFINED NETWORK PRACT...DDOS ATTACKS DETECTION USING DYNAMIC ENTROPY INSOFTWARE-DEFINED NETWORK PRACT...
DDOS ATTACKS DETECTION USING DYNAMIC ENTROPY INSOFTWARE-DEFINED NETWORK PRACT...IJCNCJournal
 
DDoS Attacks Detection using Dynamic Entropy in Software-Defined Network Prac...
DDoS Attacks Detection using Dynamic Entropy in Software-Defined Network Prac...DDoS Attacks Detection using Dynamic Entropy in Software-Defined Network Prac...
DDoS Attacks Detection using Dynamic Entropy in Software-Defined Network Prac...IJCNCJournal
 
International Journal of Network Security & Its Applications (IJNSA)
International Journal of Network Security & Its Applications (IJNSA)International Journal of Network Security & Its Applications (IJNSA)
International Journal of Network Security & Its Applications (IJNSA)IJNSA Journal
 
International Journal of Network Security & Its Applications (IJNSA)
International Journal of Network Security & Its Applications (IJNSA)International Journal of Network Security & Its Applications (IJNSA)
International Journal of Network Security & Its Applications (IJNSA)IJNSA Journal
 
Anomaly detection in the services provided by multi cloud architectures a survey
Anomaly detection in the services provided by multi cloud architectures a surveyAnomaly detection in the services provided by multi cloud architectures a survey
Anomaly detection in the services provided by multi cloud architectures a surveyeSAT Publishing House
 
.Net projects 2011 by core ieeeprojects.com
.Net projects 2011 by core ieeeprojects.com .Net projects 2011 by core ieeeprojects.com
.Net projects 2011 by core ieeeprojects.com msudan92
 

Similar to DDSGA: A Data-Driven Semi-Global Alignment Approach for Detecting Masquerade Attacks (20)

Evasion Streamline Intruders Using Graph Based Attacker model Analysis and Co...
Evasion Streamline Intruders Using Graph Based Attacker model Analysis and Co...Evasion Streamline Intruders Using Graph Based Attacker model Analysis and Co...
Evasion Streamline Intruders Using Graph Based Attacker model Analysis and Co...
 
Bk4301345349
Bk4301345349Bk4301345349
Bk4301345349
 
Intrusion detection system based on web usage mining
Intrusion detection system based on web usage miningIntrusion detection system based on web usage mining
Intrusion detection system based on web usage mining
 
Parallel and distributed system projects for java and dot net
Parallel and distributed system projects for java and dot netParallel and distributed system projects for java and dot net
Parallel and distributed system projects for java and dot net
 
M0446772
M0446772M0446772
M0446772
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Sub1582
Sub1582Sub1582
Sub1582
 
An intelligent system to detect slow denial of service attacks in software-de...
An intelligent system to detect slow denial of service attacks in software-de...An intelligent system to detect slow denial of service attacks in software-de...
An intelligent system to detect slow denial of service attacks in software-de...
 
2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...
2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...
2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...
 
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...
 
IRJET- Machine Learning based Network Security
IRJET-  	  Machine Learning based Network SecurityIRJET-  	  Machine Learning based Network Security
IRJET- Machine Learning based Network Security
 
A new proactive feature selection model based on the enhanced optimization a...
A new proactive feature selection model based on the enhanced  optimization a...A new proactive feature selection model based on the enhanced  optimization a...
A new proactive feature selection model based on the enhanced optimization a...
 
A system for denial of-service attack detection based on multivariate correla...
A system for denial of-service attack detection based on multivariate correla...A system for denial of-service attack detection based on multivariate correla...
A system for denial of-service attack detection based on multivariate correla...
 
WS97-07-013
WS97-07-013WS97-07-013
WS97-07-013
 
DDOS ATTACKS DETECTION USING DYNAMIC ENTROPY INSOFTWARE-DEFINED NETWORK PRACT...
DDOS ATTACKS DETECTION USING DYNAMIC ENTROPY INSOFTWARE-DEFINED NETWORK PRACT...DDOS ATTACKS DETECTION USING DYNAMIC ENTROPY INSOFTWARE-DEFINED NETWORK PRACT...
DDOS ATTACKS DETECTION USING DYNAMIC ENTROPY INSOFTWARE-DEFINED NETWORK PRACT...
 
DDoS Attacks Detection using Dynamic Entropy in Software-Defined Network Prac...
DDoS Attacks Detection using Dynamic Entropy in Software-Defined Network Prac...DDoS Attacks Detection using Dynamic Entropy in Software-Defined Network Prac...
DDoS Attacks Detection using Dynamic Entropy in Software-Defined Network Prac...
 
International Journal of Network Security & Its Applications (IJNSA)
International Journal of Network Security & Its Applications (IJNSA)International Journal of Network Security & Its Applications (IJNSA)
International Journal of Network Security & Its Applications (IJNSA)
 
International Journal of Network Security & Its Applications (IJNSA)
International Journal of Network Security & Its Applications (IJNSA)International Journal of Network Security & Its Applications (IJNSA)
International Journal of Network Security & Its Applications (IJNSA)
 
Anomaly detection in the services provided by multi cloud architectures a survey
Anomaly detection in the services provided by multi cloud architectures a surveyAnomaly detection in the services provided by multi cloud architectures a survey
Anomaly detection in the services provided by multi cloud architectures a survey
 
.Net projects 2011 by core ieeeprojects.com
.Net projects 2011 by core ieeeprojects.com .Net projects 2011 by core ieeeprojects.com
.Net projects 2011 by core ieeeprojects.com
 

Recently uploaded

How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 

Recently uploaded (20)

How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 

DDSGA: A Data-Driven Semi-Global Alignment Approach for Detecting Masquerade Attacks

  • 1. DDSGA: A Data-Driven Semi-Global Alignment Approach for Detecting Masquerade Attacks Hisham A. Kholidy, Fabrizio Baiardi, and Salim Hariri, Member, IEEE Computer Society Abstract—A masquerade attacker impersonates a legal user to utilize the user services and privileges. The semi-global alignment algorithm (SGA) is one of the most effective and efficient techniques to detect these attacks but it has not reached yet the accuracy and performance required by large scale, multiuser systems. To improve both the effectiveness and the performances of this algorithm, we propose the Data-Driven Semi-Global Alignment, DDSGA approach. From the security effectiveness view point, DDSGA improves the scoring systems by adopting distinct alignment parameters for each user. Furthermore, it tolerates small mutations in user command sequences by allowing small changes in the low-level representation of the commands functionality. It also adapts to changes in the user behaviour by updating the signature of a user according to its current behaviour. To optimize the runtime overhead, DDSGA minimizes the alignment overhead and parallelizes the detection and the update. After describing the DDSGA phases, we present the experimental results that show that DDSGA achieves a high hit ratio of 88.4 percent with a low false positive rate of 1.7 percent. It improves the hit ratio of the enhanced SGA by about 21.9 percent and reduces Maxion-Townsend cost by 22.5 percent. Hence, DDSGA results in improving both the hit ratio and false positive rates with an acceptable computational overhead. Index Terms—Masquerade detection, sequence alignment, security, intrusion detection, attacks Ç 1 INTRODUCTION A masquerader is an attacker who authenticates as a legal user by stealing its credentials or by violating the authentication service. An insider masquerader is a legal system user that misuses his/her privileges to access dis- tinct accounts and perform unauthorized actions. An out- sider aims to utilize all the privileges of a legal user. Alternative implementations of this attack [1] do exist, such as duplication or ex-filtration of user password, installation of software with backdoors or malicious code, eavesdrop- ping and packet sniffing, spoofing and social engineering attacks. These attacks may leave some trail in log files that, after the fact, can be linked to some user. In this case, a log analysis by a host-based IDS remains the state-of-the art to detect these attacks. Attacks that do not leave an audit trail in the target system may be discovered by analyzing the user behaviors through masquerade detection. At first, mas- querade detection builds a profile for each user by gathering information such as login time, location, session duration, CPU time, commands issued, user ID and user IP address. Then, it compares these profiles against logs and signals as an attack any behaviour that does not match the profile. The current detection approaches have not achieved the level of accuracy and performance for practical deployment in spite of the large amount of information they used to build a pro- file such as command line commands, system calls, mouse movements, opened files names, opened windows title, and network actions. Semi-global alignment (SGA) [2] is one of the most efficient detection algorithms and its accuracy was improved by Coull et al. [3]. We refer to this new improve- ment as “Enhanced-SGA”. This paper introduces the Data- Driven Semi-Global Alignment (DDSGA) approach, which improves both the detection accuracy and the computa- tional performance of the Enhanced-SGA and of HSGAA [4] that is also based upon SGA. The main idea underlying DDSGA is to consider the best alignment of the active ses- sion sequence to the recorded sequences of the same user. After discovering the misalignment areas, we label them as anomalous and several anomalous areas are a strong indica- tor of a masquerade attack. DDSGA improves the security efficiency by using not only lexical matching such as string matching or longest common substring searches, but also by tolerating small mutations in the sequences with small changes in the low-level representation of the user com- mands. To this purpose, a command can be aligned with one that implements the same functionalities. To increase the hit ratio and reduce both false positive and false nega- tive rates, DDSGA pairs each user with distinct gap inser- tion penalties according to the user behavior. Furthermore, it improves both the alignment scoring system and the update phase of Enhanced-SGA to tolerate changes in behaviours without significantly reducing the alignment score. To reduce both the runtime overhead and the mas- querader live time inside the system, DDSGA implements the detection and update operations in parallel threads and simplifies the alignment. After highlighting the SEA dataset [5] that we use to compare DDSGA against other H.A. Kholidy is with the Faculty of Computers and Information, Fayoum University, Fayoum, Egypt, the Department of Computer Sci- ence and Engineering, Qatar University, Doha, Qatar, the NSF Cloud and Autonomic Computing Center, College of Electrical and Computer Engineering, University of Arizona, Tucson, Arizona, and the Dipar- timento di Informatica, Universita di Pisa, Pisa, Italy. E-mail: hisham_dev@yahoo.com. F. Baiardi is with the Department of Computer Science, Pisa University, Pisa 56017, Tuscany, Italy. E-mail: baiardi@di.unipi.it. S. Hariri is with the NSF Cloud and Autonomic Computing Center, College of Electrical and Computer Engineering, University of Arizona, Tucson, Arizona. E-mail: hariri@ece.arizona.edu. Manuscript received 3 June 2012; revised 27 Jan. 2014; accepted 2 May 2014. Date of publication 2 June 2014; date of current version 13 Mar. 2015. For information on obtaining reprints of this article, please send e-mail to: reprints@ieee.org, and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TDSC.2014.2327966 164 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 2, MARCH/APRIL 2015 1545-5971 ß 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
  • 2. techniques, Section 2 briefly reviews masquerade detection. Section 3 discusses the SGA algorithm. Section 4 introduces DDSGA and describes its three main phases namely, config- uration, detection, and update. It also describes the modules and experimental results for each phase and compares them against other approaches. Lastly, Section 5 draws some con- clusion and outlines future work. 1.1 SEA Dataset Currently, five data sets may be used to evaluate masquerade detection techniques: SEA, Greenberg, Purdue, RUU, and our CIDD dataset [5], [7], [8], [9], and [10]. The SEA dataset has become the de-facto standard because it has been used by many researchers working on masquerade detection. In prin- ciple, this simplifies the comparison of distinct approaches. SEA consists of commands and user names collected from UNIX acct audit data. The data describe 50 different users each issuing 15,000 commands. The first 5,000 commands of each user are assumed to be genuine. The remaining com- mands of a user are splitted into 100 blocks of 100 commands each. Each block maybe seeded with masquerade users, i.e. with commands of other users. There is a 1 percent probabil- ity that a block is a masquerader and, in this case, there is a probability of 80 percent that the next one is a masquerader as well. As a result, approximately 5 percent of the test data are masquerades. One of the most critical defects of SEA is that it neglects command arguments and parameters. Due to the way acct collects audit data, it is difficult to distinguish commands typed by human users from those generated by shell scripts. Due to the repetitive execution of shell scripts, for some users this may result in a very regular pattern with a few commands. To address some methodological shortcom- ings, two alternative configurations of SEA, namely 1v49 [11] and SEA-I [12], have been defined but they have not been widely used and do not truthfully represent real world mas- querader activities. 2 RELATED WORK IN MASQUERADE DETECTION We briefly outline some masquerade detection approaches. The uniqueness approach [6] assumes that commands that have not been seen in the training data indicate a masquer- ader. Moreover, the probability that a masquerader has issued a command is inversely related to the number of users that use such a command. While uniqueness has a relatively poor performance, it is one of the few approaches that target false alarm rate of 1 percent. Na€ıve Bayes One-step Markov [13] is based upon one-step transi- tions from a command to the next. It builds two transition matrices for each user from, respectively, the training data- base and the testing one and it triggers an alarm when these matrices noticeably differ. The false alarm rate of this method is not satisfactory. The Hybrid Multi-Step Markov method [14] is based on Markov chains. When a Markov model cannot be adopted because too many commands in the testing data have not been observed in the training, a simple independence model with probabilities estimated from a contingency table of users versus commands may be more appropriate. Schonlau et al. [6] toggled between a Markov model and the simple independence one. This approach achieves the best performance among the considered methods. The main idea underlying the com- pression approach [6] is that new and old data from the same user should compress at about the same ratio. Instead, data from a masquerading user will compress at a different ratio. Among the proposed methods, this results in the worst performance. Incremental Probabilistic Action Modeling (IPAM) [15] is based upon one-step command transition. It estimates the probability of each transition from the training data set and uses it to predict the sequence of user commands. Too many false predictions signal a masquerader. This method is in the lowest-per- forming group. Sequence-matching [16] computes a simi- larity match between the user profiles and the corresponding sequence of commands. Any score lower than a threshold signals a masquerader. Its performance on the SEA data set is not very high. Support Vector Machine (SVM) [17] denotes a set of machine learning algorithms for binary data classification. It exploits a set of support vectors in the training data that outlines a hyper- plane in feature space [17]. SVM can potentially learn a large set of patterns but it results in high false alarm rates and a low detection rate. Furthermore, the user profile has to be updated to reduce false alarms. Szymanski and Zhang [18] propose a recursive data mining approach that discovers frequent patterns in the sequence of user com- mands, encodes them with unique symbols, and rewrites the sequence with the new coding. Then, a one-class SVM classifier detects masqueraders. This approach demands mixing user data and may not be ideal or easily imple- mented in real-world. It also suffers of some of the SVM shortcomings. Maxion and Townsend [11] applied a Na€ıve Bayes classifier widely used in text classification tasks and that classifies sequences of user-command data into either legitimate or masquerader. The method has not yet achieved the level of accuracy for practical deployment. Dash et al. [19] introduced an episode based Na€ıve Bayes technique that extracts meaningful episodes from a long sequence of commands. The Na€ıve Bayes algorithm identi- fies these episodes either as masquerade or normal accord- ing to the number of commands in masquerade blocks. The proposed technique significantly improves the hit ratio but it still has high false positive rates and it does not update the user profile. Alok et al. [20] integrates a Na€ıve Bayes approach with one based on a weighted radial basis function, WRBF, similarity. The Na€ıve Bayes algorithm includes information on the probabilities of commands by one user over the other users. Instead, the WRBF similarity takes into account the similarity measure based on the fre- quency of commands, f, and the weight associated with the frequency vectors. Here, f is a similarity score between an input frequency vector and a frequency vector from the training data set. The experiments confirm that WRBF-NB significantly improves the hit ratio but, as the previous approach, it suffers from the high false positive rates. Fur- thermore, it increases the overall overhead by computing both the Na€ıve Bayes and the WRBF and integrating their results. Lastly, it does not update the user profile and neglects the low level representation of user commands. Dash et al. [21] introduced an adaptive Na€ıve Bayes approach based on the premise that both the commands of a legitimate user and those of an attacker may differ from KHOLIDY ET AL.: DDSGA: A DATA-DRIVEN SEMI-GLOBAL ALIGNMENT APPROACH FOR DETECTING MASQUERADE ATTACKS 165
  • 3. the trained signature but the deviation of the legitimate user is momentary, whereas the attacker one persists lon- ger.The improvement in the performance of detection has been empirically verified using several data sets. However, the false positive rate is still high. Malek and Salvatore [22] have modeled user OS commands as bag-of-words without timing information. They used a one-class support vector machine to achieve a better performance than threshold- based comparison with a distance metric. The ability of sequence alignment algorithms to find areas of similarity can be exploited to differentiate legitimate usage from mas- querade attacks. To do so, a signature of the normal user behavior should be created by collecting sequences of audit data. Then, this signature is aligned with audit data from monitored sessions to find areas of similarity. Areas that do not align properly are assumed to be anomalous, and several anomalous areas are a strong indicator of masquer- ade attacks [2]. Among sequence alignment algorithms such as global, local and semi-global alignments, the most efficient one is semi-global alignment. Adesina et al. [23] modified the scoring function of the semi-global alignment algorithm to improve the detection efficiency. They used a systematically generated ASCII coded sequence audit data from Windows and UNIX systems as simulations for the intrusion data set. However, a real time evaluation using one of the current data sets is missing. Coull et al. [2] modified the Smith-Waterman align- ment algorithm [24] to a semi-global alignment algorithm that is described with some evolutions and enhancements [3] in the next section. Fig. 1 compares different techni- ques in terms of the Receiver Operator Characteristic (ROC) curves [3], [25] and the Maxion-Townsend cost function [11] and it shows that SGA achieves the best performances. The Maxion-Townsend cost function rates a masquerade detection algorithm and defines the detec- tion cost as in Eq. (1). Cost ¼ 6 Ã False Positive Rate þ ð100 À HitRatioÞ: (1) 3 SGA AND THE ENHANCED-SGA This section describes in some details SGA and some pro- posals to enhance it. SGA is more accurate and efficient than current approaches. It has low false positive and missing alarm rates and high hit ratio. It can be adopted in heterogeneous environment with distinct operating system because it can be applied to distinct audit data such as command line entries, mouse movements, system calls, registry events, file and folder names, sequence of opened windows titles and network access audit data. SGA aligns large sequence areas as in global alignments, while preserving the nature of local alignments. It can ignore both prefixes and suffixes and it only aligns the conserved area with the maximal similarity. Fig. 2 shows an application of SGA and the influential parameters of an alignment namely: match score, mismatch score, test_gap penalty, signature_gap penalty, and detec- tion threshold. To discover the optimal alignment, SGA exploits dynamic programming. To this purpose, it initializes an m þ 1 by n þ 1 score matrix, M, and then determines the value of each position of M by one of three transitions, see Fig. 3: 1) Diagonal transition. It aligns the i À 1 symbol in the signature sequence with the j À 1 symbol in the test sequence. The alignment score depends upon the lexical match of the symbols being aligned and it is added to Mði À 1; j À 1Þ. 2) Vertical transition. A gap is inserted into the signa- ture sequence and it is aligned with the jÀ1 sym- bol in the test sequence. The gap penalty is added to Mði; j À 1Þ. 3) Horizontal transition. A gap is inserted into the test sequence and aligned with the iÀ1 symbol in the Fig. 1. ROC curves for detection techniques that use SEA Dataset. Fig. 2. An alignment example using SGA algorithm. Fig. 3. The three transitions to fill each cell in the transition-matrix. 166 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 2, MARCH/APRIL 2015
  • 4. signature sequence. The gap penalty is added to Mði À 1; jÞ. The SGA scoring system determines the gap penalties in transitions 2 and 3. The actual alignment depends upon the maximum value of the three transitions and its value is assigned to M(i, j). M(i, j) is the score of the optimal align- ment of all symbols up to location iÀ1 in the signature sequence and jÀ1 in the test one. Hence, M(m, n) as the score of the optimal alignment of the two sequences. We can rebuild this alignment by tracing back the transitions that have produced the score. M(m, n) measures the similar- ity of the two sequences according to the scoring system used and it is an indicator of masquerade attacks. 3.1 The Enhanced-SGA Coull and Syzmanski [3] modified the SGA algorithm to handle the problems of the traditional Smith-Waterman alignment algorithm from two perspectives. The first one considers that the usage patterns of legal users may change due to changes in their role or to new software. A static user signature is therefore prone to label as attacks some variations of legal users. To avoid these false positives, the signature is updated as new behaviour is encountered by exploiting the ability of SGA of discover- ing areas of similarity. Furthermore, as discussed in Section 4.2.3, they defined two scoring systems, the command grouping and binary scoring systems, to set the alignment scores and the gap insertion penalties. The signature update scheme is applied with the binary scoring, their most efficient system. This scheme augments both the current signature sequence with information on the new behaviours and the user lexicon with the new commands the user invokes. The scheme also introduces a threshold for each user profile to ensure that both the signature sequence and user lexicon remain free of tainted commands from masquerade attacks. The threshold is used in both detection and update processes, and it is built through a snapshot of the user signatures. The other perspective considers that the Smith-Waterman algorithm is computationally expensive and impractical to detect mas- querade attacks on multi-user systems. By selectively align- ing only the portions of the user signature with the highest success probability, Heuristic Aligning [3] can significantly reduce the computational overhead with almost no loss of accuracy in detection. These modifications have been tested on the SEA data set to simplify the comparison with other approaches. 4 THE DATA-DRIVEN SEMI-GLOBAL ALIGNMENT APPROACH DDSGA is a masquerade detection approach based upon Enhanced-SGA [3]. It aligns the user active session sequence to the previous ones of the same user and it labels the misalignment areas as anomalous. A masquer- ade attack is signaled if the percentage of anomalous areas is larger than a dynamic, user dependent threshold. DDSGA can tolerate small mutations in the user sequences with small changes in the low level representation of user commands and it is decomposed into a configuration phase, a detection phase and an update one. The configuration phase, computes, for each user, the align- ment parameters to be used by both the detection and update phases. The detection phase aligns the user current session to the signature sequence. The computational per- formance of this phase is improved by two approaches namely the Top-Matching Based Overlapping (TMBO) and the parallelized approach. In the update phase, DDSGA extends both the user signatures and user lexicon list with the new patterns to reconfigure the system parameters. Fig. 4 shows these phases and the modules that implement them that we discuss in the following. 4.1 DDSGA Main Features and Improvements DDSGA improves both the computational and the security efficiency of the Enhanced-SGA. From a computational perspective, DDSGA improves the performance of both the detection and the update through a parallel multithreading scheme and a new Top-Matching Based Overlapping approach that improves the Heuristic Aligning and saves computational resources. While the Heuristic Aligning splits the signature sequence into a fixed overlapped subsequences of size 2n, where n is the size of the test sequence, TMBO simplifies the alignment through shorter overlapped subsequences. Besides saving computa- tional resources, this speeds up the detection and update phases and consequently reduces the masquerader live time inside the system. With respect to the accuracy of masquerade detection, DDSA introduces distinct scoring parameters for each user, namely the gap penalties and the overlapping length. The adoption of distinct scoring parameters for each user improves the detection accuracy, the false positive and false negative rates and increase the detection hit ratio with respect to the traditional and enhanced SGA that use the same parameters for any user. This neglects differences among the behaviours of distinct users and reduces the accuracy of detection, because the alignment cannot tolerate even slight changes in the user behaviour over time. Start- ing from the data of each user, the configuration phase of DDSGA computes the scoring parameters that result in the maximum alignment score for the considered user. To improve the accuracy of the alignment, DDSGA integrates binary and command group, two scoring systems suggested by Coull et al.’s, into two other scoring systems, restricted and free permutation. The resulting systems tolerate permu- tations of previously observed patterns with a low Fig. 4. DDSGA Phases and modules. KHOLIDY ET AL.: DDSGA: A DATA-DRIVEN SEMI-GLOBAL ALIGNMENT APPROACH FOR DETECTING MASQUERADE ATTACKS 167
  • 5. reduction of the score. To tolerate changes in the low-level representation of commands with the same functionality, they classify user commands into several groups and align two commands in the same group without reducing the alignment score. The DDSGA configuration phase also cre- ates a dynamic threshold for each user to be used by both the detection phase and the update one. While Enhanced- SGA builds this threshold through a snapshot of a user profile, DDSGA builds a more sensitive and dynamic threshold by considering any data in the profile. Further- more, DDSGA runs two update modules: the inline and long term modules. The inline module updates the user sig- nature patterns, the user lexicon list, and their correspond- ing command categories in a reconfiguration phase. The long-term module updates the system with the latest changes in the alignment parameters. It also updates the dynamic threshold values, scoring parameters, and the overlapping length i.e., the length of the overlapped signa- ture subsequence according to maximum number of inserted test gaps. The dynamic threshold, the scoring sys- tems, and the two update modules enable DDSGA to toler- ate slight changes in the user behaviour overtime. Table 1 briefly compares DDSGA and Enhanced-SGA. In the following sections we detail the current implemen- tation of DDSGA. 4.2 The Configuration Phase We briefly define the parameters that this phase computes for each user and that are used by the following phases. The detailed calculation of each parameter is described in the remainder of this section. Optimal gap penalties. The optimal test gap penalty and the optimal signature gap penalty are paid when inserting a gap into the test sequence and the signature one respectively. While in Enhanced-SGA all the users share the same fixed penalties, DDSGA computes two distinct penalties for each user accord- ing to distinct user behaviours. In this way, DDSGA determines the smallest penalties corresponding to the maximum alignment score. Mismatch score. DDSGA evaluates the mismatch score through the restricted permutation scoring system and the free permutation one. These systems integrate the command grouping and binary scoring systems defined in [3]. Average optimal threshold. DDSGA computes a dis- tinct threshold value for each user according to the changes in the user behavior. The threshold is used in both the detection and update phases and its sen- sitivity affects the accuracy of both phases, as dis- cussed in the average threshold module. Maximum factor of test gaps (mftg). This parameter relates the largest number of gaps inserted into the user test sequences to the length of these sequences. DDSGA computes a distinct parameter for each user and updates it in the update phase. The detection phase uses the parameter to evaluate the maximum length of the overlapped signature subsequences in the TMBO. The configuration phase is implemented using the fol- lowing five modules. 4.2.1 DDSGA Initialization Module To provide an independent set of test and signature sequen- ces for the configuration phase of each user, we split the user signatures into nt non-overlapped-blocks each of length n and use them as test sequences to the user. These sequences represent all given combinations of users signa- ture sequences and all the modules in the configuration phase use them to compute the user alignment parameters. These sequences differ from those used in the detection phase. To define the signature sequences, we divide the user signature sequence into a set of overlapped groups of length m ¼ 2n. In this way, the last n symbols of a block also appear as the first n of the next one. ns, the number of signature subsequences is equal to nt-1 groups to consider all possible adjacent pairs of the signature sequences of size n. We have chosen a length 2n to overlap the signature sequence because any particular alignment uses subsequen- ces with a length that is, at most, 2n. Any longer subse- quence necessary scores poorly, because of the number of gaps to be inserted. In fact, since the scoring alignment depends upon the match between the test and the signature subsequences, the former should be shorter than the latter. As a consequence, the signature sequence for this phase consists of 2n command produced by overlapping the sig- nature sequence. Hence, there are nt-1 groups that are cre- ated as in Fig. 5. In the case of SEA, ns ¼ 49 subsequences and a tested block consists of 100 commands because SEA marks each 100 command block as an intrusion or a non-intru- sion. Since SEA does not supply any information on which commands in a block correspond to the intrusion, the correctness of larger or smaller blocks cannot be checked. The dynamic average threshold for both the detection and update phases is the average score of all the alignments between each test sequence of length 100- command and the 49 overlapped 2n signature subse- quences. A test sequence is not aligned with the signa- ture subsequence that contains the test sequence itself because this returns a 100 percent alignment score. We will detail this step in the average threshold module. TABLE 1 A Comparison between DDSGA and Enhanced-SGA Enhanced-SGA DDSGA Accuracy 1- Command group- ing and binary scor- ing systems 2- Fixed and equal gab insertion penal- ties for all the users. 3- Compute the threshold for each user using snapshots of user data. 4- Signature update. 1- Free and restricted per- mutation scoring systems 2- Define gab insertion penalties for each user independently, based on user previous data. 3- Compute an optimal threshold for each user with all user signature sequences. 4- Signature update (inline and long term). Computational Performance Heuristic Aligning. 1-TMBO Approach. 2-Parallel computation. 168 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 2, MARCH/APRIL 2015
  • 6. 4.2.2 User’s Lexicon Categorization Module This module builds a lexicon for each user, i.e. list of lexical patterns classified according to their functionality and that is used to tolerate changes in the low level representation of a pattern. In SEA data set, these patterns are UNIX com- mands. This module combines the user lexicon list and com- mand grouping approach introduced in [3]. Table 2 shows an example of a classification of user com- mands into several categories according to their functionali- ties e.g., since a user can use either cat or vi to write a file, the two commands can be aligned because both belong to the same group, “Text processing”. In the same way, grep can be aligned with find because they both belong to “searching”. 4.2.3 Scoring Parameters Module Starting from the test and signature subsequences of each user, this module returns three parameters: optimal test gap penalty, optimal signature gap penalty, and mismatch score. At first, the module inserts into the list top_match_list all the test sequences with the top match score. This list enables DDSGA to align the top match test sequences only rather than all the nt sequences. To build the top_match_list, we select the highest match scores for all the nt sequences. The match score MS of a test sequence is computed as in Eq. (2). Then, the top_match_list sequences are aligned to the ns overlapped signature subsequences using any possible gap penalty, i.e. the test gap penalty ranges from 1 to n, while the signature gap penalty ranges from 1 to n. The mismatch score is 0 and the match score is þ2. MS ¼ Xn i¼1 Xnt k¼1 MinðNoccur Itself pið Þ; Noccur seqkðpiÞÞ; (2) where: - n is the length of the test sequence, nt is number of test sequences, - Noccur_Itself (pi) is number of occurrence of pattern i in the current evaluated sequence, - Noccur_Seq k(pi) is number of occurrence of pattern i in test sequence k. By computing each alignment separately, we produce sev- eral alignment scores for each combination of the scoring parameters and select as the optimal parameters those resulting in the maximum score. If several alignments result in this score, we select the one with the smallest penalties of inserting a gap into the signature subsequence and test one, respectively. To justify this solution, consider that the high- est alignment score denotes a high level of alignment between the test and the signature subsequences and SGA normally subtracts these penalties from the score. We have applied the previous steps to each user in SEA data set to define the corresponding optimal penalties. Fig. 6 and Table 3 show the penalties for user 1. This algorithm computes the mismatch_score parame- ter using the restricted and the free permutation scoring systems as shown in Figs. 7a and 7b. Both systems inte- grate command grouping and binary scoring [3] and are Fig. 5. The non-overlapped test sequences and the overlapped signa- ture subsequences. TABLE 2 User 1 Lexicon List User’s Lexicon List Functional Group cat File system, Text processing vi Text processing sh Shell programming env User environment kill Process Management reaper Networking mail Internet Applications hostname System Information nawk Development, Shell Programming getopt Development, Shell Programming grep Searching, Text processing find Searching Fig. 6. The best alignment score that corresponds to the optimal combi- nations of gap penalties for user 1 in SEA Dataset. TABLE 3 Example of the Top Match Scores of User 1 Max align- ment score Test sequence index (i) Signature gap penalty Test gap penalty Optimal combina- tion 154 2 99 95 - 154 5 100 95 - 154 8 97 100 - 154 13 97 99 ok KHOLIDY ET AL.: DDSGA: A DATA-DRIVEN SEMI-GLOBAL ALIGNMENT APPROACH FOR DETECTING MASQUERADE ATTACKS 169
  • 7. based on distinct assumption about mutations in the audit data. Command grouping assumes that sequences of audit data mutate by replacing some original symbols with func- tionally similar ones. Command grouping assigns a static reward of +2 to exact matches and scores a mismatch through the functional groups of the two commands. If a command in the signature aligns with a mismatched com- mand in the test sequence but that belongs to the same group, the mismatch score is set to +1 rather than to À1. The assumption on mutation of the binary scoring system fol- lows the results in [26] where mutation does not replace base symbols in the user lexicon. In fact, these symbols are a strong indicator of a legal use, but the original base symbols can be permuted in some fashion [3]. The binary scoring system rewards exact matches by adding +2 to the align- ment score. A mismatch is scored to +1 if the mismatched command has previously occurred in the user lexicon and to À1 otherwise. Both scoring systems penalize the insertion of a gap into the signature sequence and into the test one by, respectively, À3 and À2. Restricted permutation scoring system. It rewards a mis- matched command in the test sequence if the two commands belong to the same group. These groups are manually cre- ated from a set of common UNIX commands in the signature sequences. Furthermore, the mismatched command of the test sequence should have previously occurred in the user lexicon. This tolerates various permutations of previously observed patterns without reducing the score significantly. Free permutation scoring system. This system is more toler- ant than the previous one because it does not require that the mismatched commands belong to the same group and it even rewards a mismatched command provided that it belongs to the user lexicon. This tolerates a larger number of permutations of the signature patterns without reducing the score significantly. 4.2.4 Average Threshold Module This module computes a dynamic average threshold for each user to be used in the detection phase and that may be updated in the update phase. In the detection phase, if thez alignment score is lower than the threshold, then the behavior is classified as a masquerade attack. While the Enhanced-SGA only considers snapshots of the user data, this module considers all the user data. This improves the sensitivity of the threshold that, in turns, determines the one of the detection. The module uses the same test and sig- nature subsequences of the initialization module and it can work with a test sequence of any length because, in practical deployment user test session can be of any length. At first, the module builds a trace-matrix, to record all alignment details of the current user. We align each test sequence to all ns overlapped signature subsequences and run the module twice, one for each scoring systems to select the one to be used. The detection phase uses the output of this module to compare the two scoring systems. These alignments use the optimal scoring parameters for the current user. The mod- ule applies Eq. (3) to compute the average alignment of test sequence i, avg_align_i, and the sub_average score for all previous alignment scores, score_align_i, of test sequence i. In the equation, max_score_align_i is the largest alignment score resulting from the alignment of sequence i to all ns sig- nature subsequences. Then, the module applies Eq. (4) to compute the detection_update_threshold, the overall aver- age of the nt sub average scores. avg align i ¼ Pns j¼1 score align ij =ns max score align i à 100 (3) detection update threshold ¼ Pnt k¼1 avg align ik nt (4) To compute the optimal alignment path and number of test gaps (ntg) to be inserted into test sequence i, we apply the trace backward algorithm (TBA) to trace back the transi- tions to derive each optimal score. An example is shown in Table 4. The Trace backward algorithm. Any dynamic programming algorithm can select the optimal alignment path by tracing back through the optimal scores computed by SGA. The TBA implements one of the known dynamic programming algorithms to build the backward-transition-matrix, see Fig. 8, in a way that helps in filling the trace-matrix in Fig. 7. (a) The restricted permutation scoring system, (b) The free per- mutation scoring system. 170 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 2, MARCH/APRIL 2015
  • 8. Table 4 as previously discussed. The alignment process also applies the TBA to extract the final alignment path. The TBA traces back the transition-matrix according to the labels that the alignment has inserted into this matrix. One of four labels may be inserted: “M” if a match has occurred, “!M” if a mismatch has occurred, “GS” or “GT” if a gap has been inserted into a signature subsequence or into the test one. Fig. 8 outlines the TBA and shows the transition-matrix and the corresponding backward-transition-matrix to align the test sequence “AWGHE” to the signature sequence “PAWHE”. 4.2.5 Maximum Test Gap Module We recall that the Enhanced-SGA Heuristic Aligning decomposes the signature subsequence into 2n overlapped subsequences because if subsequences of length n are aligned, the maximum number of gaps that can be inserted into the test sequence is n for all users. By tracing the SGA algorithm, we have noticed that the maximum number of gaps is much lower than n, the length of the test sequence, and it differs for each user according to the level of similar- ity among the subsequences in the user signature and to the length of the test sequence. Even if the test sequence is long enough, the number of gaps is at most half of the sequence length. This means that by partitioning the signature sequence into 2n overlapped subsequences, the Maximum Test Gap module can divide it as in Eq. (5) to compute mftg, the largest number of test gaps inserted into the user test sequences by the average threshold module. The Computa- tional Enhancement (CE) for the session alignment of a user can be computed according to Eq. (6). We refer to the detec- tion phase for an example that explains this section. L ¼ n þ Max nt k ¼ 1 ntgk ltsk à n ; (5) where: - ntg is number of test gaps inserted to each test sequence, - lts is the length of the test sequence, - nt is number of the test sequences of the user, fifty in case of SEA, CE ¼ ðn À mftgd eÞ Ã ð100=ð2 à nÞÞ% (6) Where, Maximum Factor of Test Gaps inserted into all user test sequences (mftg) is computed as in Eq. (7). mftg ¼ Max nt k ¼ 1 ntgk ltsk : (7) 4.3 The Detection Phase We have run a complete alignment experiment based upon the test and signature blocks of the SEA data set to evaluate the alignment parameters and the two scoring systems. The test blocks are the actual SEA testing data and they differ from those described in the initialization module. To simplify a comparison with other approaches, we use the ROC curve and the Maxion-Townsend cost function defined in Eq. (1). Our experimentation focuses on the effects of the align- ment parameters on the false positive and false negative rates and on the hit ratio. This experiment did not apply the maximum test gap module. False positives, false negatives and hits are computed for each user, transformed into the corresponding rates that are then summed and averaged over all 50 users. Equations (8), (9), and (10) show the DDSGA metrics. TotalFalsePositive ¼ Xnu k¼1 fpk nk ! nu ! à 100 (8) Where: - fp ¼ No. of false positive alarms, - n ¼ No. of non-intrusion command sequence blocks, - nu ¼ No. of users (50 in our case) TotalFalseNegative ¼ Xnui k¼1 fnk nik ! nui ! à 100 (9) Where: - fn ¼ No. of false negatives, - ni ¼ No. of intrusion command sequence blocks, - nui ¼ No. of users who have at least one intrusion block Fig. 8. The transition and Backward_Transition matrices respectively. The thick and the thin arrows show, respectively, the optimal path that leads to the maximum alignment score and those that do not lead to the maximum alignment score. TABLE 4 An Example for the Trace Matrix Test sequence ID Length of test sequence (lts) Signature sub- sequence ID Optimal alignment Number of Test gaps (ntg) Avg-align of test sequence i (%) 1 100 1 27 65 .. . .. . .. . .. . .. . 1 100 49 77 22 70.31 .. . .. . .. . .. . .. . .. . 50 100 1 122 9 .. . .. . .. . .. . .. . 50 100 49 102 11 73.75 KHOLIDY ET AL.: DDSGA: A DATA-DRIVEN SEMI-GLOBAL ALIGNMENT APPROACH FOR DETECTING MASQUERADE ATTACKS 171
  • 9. TotalHitRatio ¼ 100 À TotalFalseNegative: (10) To plot the ROC curve, the experiment with the tradi- tional SGA algorithm have used distinct values of the align- ment parameters to obtain different false positive rates in the x-axis and the corresponding hit ratios in y-axis. We also repeat the experiment with distinct values of some alignment parameters such as reward for matches and rewards or penalties for mismatches computed by the two scoring systems. Fig. 9 shows the ROC curve for the two scoring systems, the Enhanced-SGA scoring systems, and other detection approaches that also use SEA data set, based upon the previous metrics. According to this figure, the restricted permutation sys- tem results in higher hit ratio with corresponding low false positive rates. As the false positives rate increases, the free permutation system achieves a higher hit ratio than the restricted one because it can tolerate a large number of mutations and deviations in user behaviours. According to this experiment, we have adopted the restricted permuta- tion system as a suitable scoring system for all the phases of DDSGA. Table 5 compares DDSGA using the restricted and the free permutation scoring systems against the cur- rent masquerade detection approaches sorted by Maxion- Townsend cost. The results for the various detection approaches, including all ROC curve values, are published in the references shown in Table 5 and are sorted by “Maxion Townsend cost” that is used to siplify the compar- ison by considering both the false positive and hit ratio. We re-implemented all Coul et al. works in [2], [3] to compare it against DDSGA. The computational enhancement modules. The computa- tional complexity of SGA is rather large. As an example, it requires about 500,000 operations to test one user ses- sion for masquerade attacks in the SEA data set, because the length of the signature sequence is 5,000 while that of the test sequence is 100. The resulting overhead is unacceptable in multiuser environments like cloud systems or in computationally limited devices like wire- less systems. To reduce this overhead, we introduce two computational enhancements that concern, respectively, the Top-Matching Based Overlapping module and the parallelized detection module that are executed in each alignment session. In the following, we outline in details these two modules. 4.3.1 The Top-Matching Based Overlapping Module To align the session patterns to a set of overlapped subse- quences of the user signatures, this module uses the restricted permutation scoring system, Maximum Factor of Test Gaps (mftg) in Eq. (7), and the scoring parameters of each user. As explained in Section 4.1, the TMBO improves the Heuritic Aligning of the Enhanced-SGA. After splitting the signature sequence into a set of over- lapped blocks of length L, see Eq. (5), it chooses the sub- sequence with the highest match to be used in the alignment process. In the worst case, all overlapped Fig. 9. ROC curve for our two scoring systems, SGA ones and other detection approaches. TABLE 5 A Comparison between Our Two Scoring Systems and the Current Detection Approaches Approach Name Hit Ratio percent False Positive percent Maxion-Town- send Cost DDSGA (Restricted Permutation) 83.3 3.4 37.1 DDSGA (Free Permutation) 80.5 3.8 42.3 SGA (Signature Updating) [3] 68.6 1.9 42.8 SGA (Signature Updating þ Heuristic Aligning) [3] 66.5 1.8 44.3 Na€ıve Bayes (With Updating) [11] 61.5 1.3 46.3 SGA (Binary Scoring) [3] 60.3 2.9 57.1 Adaptive Na€ıve Bayes [21] 87.8 7.7 58.4 Recursive Data Mining [18] 62.3 3.7 59.9 Na€ıve Bayes (No Updating) [11] 66.2 4.6 61.4 WRBF-NB [20] 83.1 7.7 63.1 Episode based Na€ıve Bayes [19] 77.6 7.7 68.6 Uniqueness [8] 39.4 1.4 69.0 Hybrid Markov [14] 49.3 3.2 69.9 SGA (Previous Scoring) [2] 75.8 7.7 70.4 Bayes 1-Step Markov [13] 69.3 6.7 70.9 IPAM [15] 41.1 2.7 75.1 SGA (Command Grouping) [3] 42.2 3.5 78.8 Sequence Matching [16] 36.8 3.7 85.4 Compression [6] 34.2 5.0 95.8 172 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 2, MARCH/APRIL 2015
  • 10. subsequences should be aligned as they have the same highest match value. We have verified that on average, the number of alignments is rather smaller because of the variation between the overlapped signature subsequen- ces. The evaluation of the proposed TMBO approach mainly depends upon two parameters: (a) Number of average alignments for the detection process, (b) The effect of the TMBO on false alarm rates and hit ratio. To evaluate our approach with respect to (a) we show through an example how it reduces the alignment com- putations. As far as concerns (b), we use the ROC curve and Maxion-Townsend. To evaluate TMBO, Fig. 10 shows the user session patterns to be aligned to the sig- nature patterns. The configuration phase returns these values: signature_gap_penalty ¼ 9, test_gap_penalty ¼ 5, optimal_score_sys ¼ “Restricted Permutation”, detectio- n_update_threshold ¼ 82.2%, and mftg (maximum factor of test_gaps) ¼ 33%. The first step of TMBO computes the following length of the overlapped subsequences according to Eq. (5): L ¼ n þ mftg à nd eð Þ ¼ 10 þ 33=100 à 10d eð Þ ¼ 10 þ 3:3d e ¼ 10 þ 4 ¼ 14: With respect to the one in the initialization module, the current overlapping runs with length L rather than 2n. Fig. 10 shows the resulting overlapped signature subse- quences of size L ¼ 14. The last subsequence, i.e. subse- quence 15, may be shorter than L, but it is still longer than the test sequence. The second step computes the match corresponding to each subsequence as shown in the front of each subse- quence in Fig. 10. We only consider the matching of subse- quence 1 because those for other subsequences are similar. If we denote by Match(X, s), the minimum between the occurrences of X in, respectively, a user session and in a subsequence, then MMMatchðsssubseq:1Þ ¼ X ðX in A; B; C; D; E; FðMatchðX; subseq:1ÞÞ ¼ X ðMinð3; 1Þ; Minð2; 2Þ; Minð1; 2Þ; Minð1; 2Þ; Minð2; 2Þ; Minð1; 2ÞÞ ¼ X ð1; 2; 1; 1; 2; 1Þ ¼ 8: The third step chooses the top match subsequences, e.g. subsequences 2 and 15 in the example, as the best signature subsequences to be aligned against the test session patterns of the user. To evaluate the reduction of the workload due to TMBO, consider the Number of Asymptotic Computa- tions (NAC) computed by Eq. (11). In the previous example, TMBO reduces the number of alignments from 3000 to 280 with a saving of 90.66 percent. NAC ¼ Avg n align à Sig len à Test len (11) Where: - Avg_n_align is the average number of alignments required for one detection session over all existing users. - Sig_len is the length of the overlapped signature subsequence. - Test_len is the length of the test sequence). To determine if the session patterns of the current user contain a masquerader, the final step compares the highest scores of the previous two alignments against a detectio- n_update_threshold. As explained in the update phase, if at least one of the previous eight alignments has a score larger than or equal to the detection_update_threshold, then an inline update process should be executed for the signature subsequence and the user lexicon. The evaluation using the SEA data set shows that TMBO reduces the maximum number of alignments from 49 to an average of 5.13 alignments per detection, a substantial improvement in detection scenarios. Table 6 shows the asymptotic computations for three detection approaches. The first is our TMBO without the inline update module. The second one is the Heuristic Aligning with signature update [3]. Finally, the third one is the traditional SGA algo- rithm without the Heuristic Aligning or update feature. The NAC per one detection session can be computed as in Eq. (11). If we considered that each of the fifty users in SEA data set has one active session in a multi users system, then Total_NAC ¼ NAC à 50. To evaluate false alarm rates and hit ratios, we have tested TMBO using the ROC curve and Maxion-Townsend score. Table 7 and Fig. 11 show that TMBO has a lower impact on the overall accuracy than other approaches. 4.3.2 The Parallelized Detection Module Since TMBO partitions the user signature into a set of overlapped subsequences, we can parallelize the detection algorithm because it can align the commands in the user test session to each top match signature subsequence sepa- rately. In the example of Section 4.2.1, we can run in parallel the threads to align subsequences 2 and 15. If a thread Fig. 10. Overlapped Signature Subsequences of Size 14. KHOLIDY ET AL.: DDSGA: A DATA-DRIVEN SEMI-GLOBAL ALIGNMENT APPROACH FOR DETECTING MASQUERADE ATTACKS 173
  • 11. returns a computed alignment score at least equal to detec- tion_update_threshold, then it sends a “No Masquerader” message and then runs an inline update of both its signature subsequence and the lexicon of the current user. Instead, if the alignment score is lower than the detection_update_th- reshold, the thread raises a “Masquerader Detected” alert. Fig. 12 shows the parallelized detection module processes. We have run several experiments using the SEA data set to evaluate how the parallel version of the module influen- ces the overall performance of detection. The experiments have used an Intel Core 2 Duo Processor E4500 (2M Cache, 2.20 GHz) with 4 GB of memory and running Windows 7 SP1, Machine “A”. Machine “B” is Intel Core i3-2330M (3M Cache, 2.30 GHz) with 6 GB of memory and running Win- dows 7 SP1. The results show that there are two operational modes of the module: Full Parallelization Mode (FPM) and Semi Paral- lelization (SPM) one. The module selects the most appropri- ate one according to the capabilities of the underlying machine and to n_aligns, the number of alignments per detection. It selects the FPM mode if the machine capabilities match n_aligns so that the module achieves the best perfor- mance. This is the most common mode in our experiments. An example is the detection sessions of user 7 where n_aligns ¼ 5, Fig. 13 shows that if five thread are used then each thread runs a distinct alignment and this minimizes the detection time. The SPM mode is selected if the machine capabilities do not match the required n_aligns. This results in a small performance degradation due to inactive threads. The SPM mode is rarely selected in our experiments TABLE 6 TMBO Approach in Three Detection Approaches Approach Name Avg-n-align NAC (1 user) NAC (50 users) DDSGA with L ¼ 145.73 5. 13 5. 13 Ã 145.73 Ã 100 ¼ 74759.49 74759.49 Ã 50 ¼ 3737974.5 SGA with Sig. length ¼ 200 4.5 4.5 Ã 200 Ã 100 ¼ 90,000 90,000 Ã 50 ¼ 4,500,000 Traditional SGA with Sig. length ¼ 200 49 49 Ã 200 Ã 100 ¼ 980,000 980,000 Ã 50 ¼ 49,000,000 TABLE 7 The Masquerade Detection Approaches against DDSGA with Its Two Scoring Systems Approach Name Hit Ratio percent False Positive percent Maxion-Townsend Cost DDSGA (Restricted Permutation, With- out Updating) 83.3 3.4 37.1 DDSGA (Restricted Permutation, With- out Updating) þ TMBO 81.5 3.3 38.3 DDSGA (Free Per- mutation) 80.5 3.8 42.3 SGA (Signature updating) 68.6 1.9 42.8 SGA (Signature updating) þ Heu- ristic 66.5 1.8 44.3 Na€ıve Bayes (With Updating) 61.5 1.3 46.3 SGA (Binary Scor- ing, No Updating) 60.3 2.9 57.1 Fig. 12. The processes of the parallelized detection module. Fig. 11. The impact of our TMBO approach on the system accuracy. 174 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 2, MARCH/APRIL 2015
  • 12. because the fifty users of the SEA data set results in a value of Avg_n_align, equals to 5.13. Hence, on average, the parallelized detection module uses 6 threads to run a detection session. A SPM mode example is the detection sessions of user 23 where n_aligns ¼ 9. Fig. 14 shows that the shortest detection time is reached when running 6 threads in machine “A” and 8 threads in machine “B”. In this case, 3 threads are idle in machine “A” and 1 in machine “B”. 4.4 The Update Phase The update of the user signature patterns is critical because any IDS should be automatically updated to the new legal behaviours of a user. This update is implemented by two modules: the inline update module and the long term update one. 4.4.1 The Inline Update Module This module has two main tasks: 1) Finding areas in user signature subsequences to be updated and augmented with the new user behavior patterns. 2) Update the user lexicon by inserting new commands. In the detection phase, after each alignment, each parallel thread may update both the user signature subsequences and the user lexicon. Three cases are possible in the TBA, see Fig. 15: 1) The test sequence pattern matches the corresponding signature subsequence pattern, 2) A gap is inserted into either or both sequences 3) There is at least a mismatch between the patterns in the two sequences. In case (a), no update is required because the alignment has properly used the symbol in the proper subsequence to find the optimal alignment. Also case (b) does not require an update because symbols that are aligned with gaps are not similar and should be neglected. In case (c), we consider all the mismatches within the cur- rent test sequence. Then, both the signature subsequence and the user lexicon are updated under two conditions. The first one states that we can insert into the user signatures only those patterns that are free of masquerading records. This happens anytime the overall_alignment_score for the current test sequence is larger than or equal to the detectio- n_update_threshold. The second condition states that the current test pattern should have previously appeared in the user lexicon or belongs to the same functional group of the corresponding signature pattern. The two conditions imply that the inline module updates the user lexicon with the new pattern if it does not belong to the lexicon. It also extends the pattern with the current signature subsequence and adds the resulting subsequence to the user signatures without changing the original one. In other words, if a pat- tern in the user lexicon or in the same functional group of its corresponding signature pattern has participated in a con- served alignment, then a new permutation of the behavior of the user has been created. For instance, if the alignment score of the test sequence in Fig. 16 is larger than or equal to the detection_update_threshold, then the pattern ‘E’ at the end of this test sequence has a mismatch with ‘A’ at the end of the signature subsequence. If ‘E’ exists in the user lexicon Fig. 13. FPM and SPM modes for user 7 in Machines “A” and “B”. Fig. 14. FPM and SPM modes for user 23 in Machines “A” and “B”. Fig. 15. The inline update steps. KHOLIDY ET AL.: DDSGA: A DATA-DRIVEN SEMI-GLOBAL ALIGNMENT APPROACH FOR DETECTING MASQUERADE ATTACKS 175
  • 13. or belongs to the same functional group of ‘A’, the signature subsequence is augmented so that the position with ‘A’ matches with both ‘A’ or ‘E’. If ‘E’ does not belong to the lex- icon, it is also inserted into it. This simply embeds observed variations in the signature sequence without destroying any information it encodes. In this case, a new augmented sub- sequence (HEE) is inserted into the user signature subse- quences. If only the first condition is satisfied, only the user lexicon is updated so that the following checks use the new pattern. In fact, it is highly likely that a pattern that has appeared within a conserved, high scoring alignment has been created by the legitimate user. Besides improving the computational performance of system update, the module also improves the signature update scheme of the Enhanced-SGA [3] as following: 1) It uses the parameters, the threshold, and the scoring system returned by the configuration phase. 2) It runs in parallel with the detection phase and starts as soon as the alignment score is computed. Instead, the signature update scheme runs independently after each detection process and it repeats the back- ward tracing step. 3) It improves flexibility in the signature update by considering any occurrence of commands permuta- tions or functionality matching. To evaluate how the inline update module reduces the false alarm rates and improves the hit ratio, we have used the ROC curve and the Maxion-Townsend score after apply- ing the inline update module. Fig. 16 and Table 8 show that the inline update module reduces the false alarm rates and increases the hit ratio. Therefore, it significantly improves the accuracy with respect to other approaches. 4.4.2 The Long Term Update Module This module reconfigures the system parameters through the outputs of the inline update module. There are three strategies to run the module: periodic, idle time, and thresh- old. The proper one is selected according to the characteris- tic and requirements of the monitored system. The periodic strategy runs the reconfiguration step with a fixed frequency, i.e. 3 days or 1 week. To reduce the over- head, the idle time strategy runs the reconfiguration step anytime the system is idle. This solution is appropriate in highly overloaded systems that require an efficient use of the network and computational resources. The threshold strat- egy runs the reconfiguration step as soon as the number of test patterns inserted into the signature sequences reaches a threshold that is distinct for each user and frequently updated. This approach is highly efficient because it runs the module only if the signature sequence is changed. 5 CONCLUSION AND FUTURE WORK Masquerading is by far one of the most critical attacks because an attacker that can successfully logs to a system can also maliciously control it. The semi-global alignments (SGA) is based upon sequence alignment and it is one of the most effective detection techniques that can be applied to distinct sequences of audit data. While SGA may result in low false positive and missing alarms rates, even its enhanced version has not yet achieved the level of accu- racy and performance for practical deployment. This is the reason underlying the design of the Data-Driven Semi-Global Alignment Approach, DDSGA. From the security efficiency perspective, DDSGA models more accu- rately the consistency of the behaviour of distinct users by introducing distinct parameters. Furthemore, it offers two scoring systems that tolerate changes in the low-level representation of the commands functionality by Fig. 16. The impact of the inline update on the system accuracy. TABLE 8 Masquerade Detection Approaches against DDSGA with Its Inline Update Module Approach Name Hit Ratio percent False Positive percent Maxion-Townsend Cost DDSGA (Inline Update þ TMBO þ Restricted Permutation) 88.4 1.7 21.8 DDSGA (Restricted Permutation, Without Updating) 83.3 3.4 37.1 DDSGA (Restricted Permutation, Without Updating) þ TMBO 81.5 3.3 38.3 DDSGA (Free Permutation) 80.5 3.8 42.3 SGA (Signature updating) 68.6 1.9 42.8 SGA (Signature updating) þ Heuristic 66.5 1.8 44.3 Na€ıve Bayes (With Updating) 61.5 1.3 46.3 SGA (Binary Scoring, No Updating) 60.3 2.9 57.1 176 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 2, MARCH/APRIL 2015
  • 14. categorizing user commands and aligning commands in the same class without reducing the alignment score. The scoring systems also tolerates both permutation of its commands and changes in the user behaviour over time. All these features strongly reduce false positive and miss- ing alarm rates and improve the detection hit ratio. In the experiments using the SEA data set, the performance of DDSGA is always better than the one of SGA. From the computational perspective, the Top-Matching Based Over- lapping approach reduces the computational load of alignment by decomposing the signature sequence into a smaller set of overlapped subsequences. Furthermore, the detection and the update processes can be parallelized with no loss of accuracy. For future work, we plan to apply our approach to detect masquerade attacks in cloud environment by improing our CIDS framework [4]. As a first step, we have developed a new data set, CIDD [10] that includes distinct audit data from distinct host operating systems and physical network environment. This will supports an evaluation of DDSGA that can use different kinds of audit sequences. ACKNOWLEDGMENTS This work was supported by QNRF (a member of Qatar Foundation), Fayoum University, Pisa University, AFOSR DDDAS award # FA9550-12-1-0241, and the US National Science Foundation (NSF) awards IIP-0758579, DUE- 1303362, and SES-1314631. REFERENCES [1] A. H. Phyo and S. M. Furnell. “A detection-oriented classification of insider it misuses,” in Proc. 3rd Security Conf. 2004. [2] S. E. Coull, J. W. Branch, B. K. Szymanski, and E. A. Breimer, “Intrusion detection: A bioinformatics approach,” in Proc. 19th Annu. Comput. Security Appl. Conf., Las Vegas, NV, USA, Dec. 2003, pp. 24–33. [3] S. E. Coulla and B. K. Szymanski, “Sequence alignment for masquerade detection,” J. Comput. Statist. Data Anal., vol. 52, no. 8, pp. 4116–4131, Apr. 2008. [4] Hisham A. Kholidy and Fabrizio Baiardi, “CIDS: A framework for intrusion detection in cloud systems,” in Proc. 9th Int. Conf. Inf. Technol.: New Generations, Las Vegas, Nevada, USA, Apr. 2012, pp. 16–18. [5] (2001) [Online]. Available: http://www.schonlau.net/intrusion. html [6] M. Schonlau, W. DuMouchel, W. Ju, A. F. Karr, M. Theus, and Y. Vardi, “Computer intrusion: Detecting masquerades,” Statist. Sci. vol. 16, no. 1, pp. 58–74, 2001. [7] “Greenberg: Using unix: Collected traces of 168 users,” Dept. Comput. Sci., Univ. Calgary, Calgary, Canada, Res. Rep. 88/333/ 45, 1988. [8] T. Lane and C. E. Brodley, “An application of machine learning to anomaly detection,” in Proc. 20th Nat. Inf. Syst. Security Conf., 1997, pp. 366–380. [9] RUU data set: (2008) [Online] Available: http://sneakers.cs. columbia.edu/ids/RUU/data/. [10] Hisham. A. Kholidy and Fabrizio Baiardi, “CIDD: A cloud intru- sion detection data set for cloud computing and masquerade attacks,” in Proc. 9th Int. Conf. Inf. Technol.: New Generations, Las Vegas, NV, USA, Apr. 2012, pp. 16–18. [11] R. A. Maxion and T. N. Townsend, “Masquerade detection using truncated command lines,” in Proc. Int. Conf. Dependable Syst. Netw., Washington, DC, USA, Jun. 2002, pp. 219–228. [12] R. Posadas, J. C. Mex-Perera, R. Monroy, and J. A. Nolazco-Flores, “Hybrid method for detecting masqueraders using session fold- ing and hidden markov models,” in Proc. 5th Mexican Int. Conf. Artif. Intell., 2006, pp. 622–631. [13] W. Dumouchel. (1999). “Computer intrusion detection based on Bayes Factors for comparing command transition probabilities”. Technical report 91, National Institute of Statistical Sciences, [Online]. Available: www.niss.org/downloadabletechreports. html [14] W. Ju and Y. Vardi. (1999). “A hybrid high-order Markov chain model for computer intrusion detection”. Nat. Inst. Statist. Sci. Research Triangle Park, NC, USA, Tech. Rep. 92. [Online]. Avail- able: www.niss.org/downloadabletechreports.html [15] Brian D. Davison and Haym Hirsh, “Predicting sequences of user actions,” in Proc. Joint Workshop Predicting Future: AI Approaches Time Ser. Anal., 1998, pp. 5–12. [16] T. Lane and C. E. Brodley, “Approaches to online learning and concept drift for user identification in computer security,” in Proc 4th Int. Conf. Knowl. Discovery Data Mining, New York, NY, USA, Aug. 1998, pp. 259–263. [17] B. Christopher, “A tutorial on support vector machines for pattern recognition,” Data Mining Knowl. Discovery, vol. 2, no. 2, pp. 121– 167, 1998. [18] B. Szymanski and Y. Zhang, “Recursive data mining for masquer- ade detection and author identification,” in Proc. IEEE 5th Syst., Man .Cybern. Inf. Assurance Workshop, West Point, NY, USA, Jun. 2004, pp. 424–431. [19] S. K. Dash, K. S. Reddy, and A. K. Pujari, “Episode based mas- querade detection,” in Proc. 1st Int. Conf. Inf. Syst. Security, 2005, pp. 251–262. [20] A. Sharma and K. K. Paliwal, “Detecting masquerades using a combination of Na€ıve Bayes and weighted RBF approach,” J. Com- put. Virology, vol. 3, no. 3, pp, 237–245, 2007. [21] Subrat Kumar Dash, K. S. Reddy, and K. A. Pujari, “Adaptive Naive Bayes method for masquerade detection”, Security Com- mun. Netw., vol. 4, no. 4, pp. 410–417, 2011. [22] S. Malek and S. Salvatore, “Detecting masqueraders: A compari- son of one-class bag-of-words user behavior modeling techniques,” in Proc. 2nd Int. Workshop Managing Insider Security Threats, Morioka, Iwate, Japan. Jun. 2010, pp. 3–13. [23] A. S. Sodiya, O. Folorunso, S. A. Onashoga, and P. O. Ogundeyi, “An improved semi-global alignment algorithm for masquerade detection,” Int. J. Netw. Security, vo1. 12, no. 3, pp. 211–220, May 2011. [24] T. F. Smith and M. S. Waterman, “Identification of common molec- ular subsequences,” J. Molecular Biol., vol. 147, pp. 195–197, 1981. [25] D. B. Christopher and T. D. Herbert, “Receiver operating charac- teristic curves and related decision measures: A tutorial,” Chemo- metrics Intell. Lab. Syst., vol. 80, pp. 24–38, 2006. [26] K. Wang and S. J. Stolfo, “One class training for masquerade detection,” in Proc. IEEE 3rd Conf. Workshop Data Mining Comput. Secur., Florida, Nov. 2003. Hesham AbdElazim Ismail Mohamed Kholidy received the PhD degree in computer science and engineering in a joint PhD program between the University of Pisa in Italy and the University of Arizona. He was an associate researcher in the US National Science Foundation (NSF) Cloud and Autonomic Computing Center, Electrical and Computer Engineering Department, University of Arizona, and a research fellow at Galileo Galilei Doctorate School at the University of Pisa. He recently published a new patent in Data Encryp- tion in the United State Patent and Trademark Office (USPTO). He pub- lished more than 25 papers in top journals and conferences, and he supervises a group of master’s and PhD students. His research interests include information security, distributed and high-performance systems including Grid and Cloud computing systems, critical and SCADA sys- tems, and bioinformatics. KHOLIDY ET AL.: DDSGA: A DATA-DRIVEN SEMI-GLOBAL ALIGNMENT APPROACH FOR DETECTING MASQUERADE ATTACKS 177
  • 15. Fabrizio Baiardi received the degree from Uni- versita di Pisa, where he is currently a full profes- sor of computer science. He has chaired the master’s degree on computer security, and teaches some courses on the security of cloud systems. His main research interests include for- mal tools for the risk assessment and manage- ment of cloud systems and critical infrastructures. He is currently involved in several projects on intrusion detection of cloud and SCADA systems. Salim Hariri received the MSc degree from Ohio State University in 1982 and the PhD degree in computer engineering from the University of Southern California in 1986. He is a professor in the Electrical Communication Engineering Department at the University of Arizona and the director of the US National Science Foundation (NSF) Center: Cloud and Autonomic Computing Center. His current research focuses on high-per- formance distributed computing, design and anal- ysis of high-speed networks, and developing software design tools for high-performance systems. He has coauthored more than 200 journal and conference research papers, and is a coau- thor/editor of three books on parallel and distributed computing. He is a member of IEEE Computer Society. For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib. 178 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 12, NO. 2, MARCH/APRIL 2015