Temporal Relations Mining
Approach to Improve
DengueOutbreak and
IntrusionThreatsSeverity
PredictionAccuracy
Nurfadhlina Mohd Sharef, Nor
Azura Husin, Khairul Azhar
Kasmiran, Mohd Izuan Ninggal
Department of Computer
Science, Faculty of Computer
Science and Information
Technology, Universiti Putra
Malaysia, UPM Serdang,
Selangor, Malaysia
{nurfadhlina,n_azura,k_azhar,
mohdizuan}@upm.edu.my
12017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO
 Temporal relations and abstractions
 order instances chronologically and with periodic
patterns
 predict movement (increase or decrease) or quantify
the possibility of the predicted event to happen
 Identify whether there exist a series of combination of
supporting events’ occurrence sequence prior to the
occurrence of the target.
 focus on values of the time series attribute
 Not explored for dengue or intrusion prediction
22017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO
 Intrusion prediction
 Time series for intrusion prediction
 binary based forecasting (i.e., increase or decrease number of
attacks),
 numerical based prediction
 anomaly detection.
 Machine learning for threat prediction
 factors such as causal networks, attacker IP, and patch levels.
 Vector control
 physical-based activities to eliminate the mosquitoes
breeding spots
 Not based on temporal pattern and impact number of recent
cases
32017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO
4
5
 Previously
 data within different time series
 threat factor profiling
 model to predict the ATL and compute the OTL.
 sources integration, fuzzifying the threat
severity, computing asset (ATL) and
organisational (OTL) based threat level
 Temporal data are hourly, daily, weekly,
and so on
 not explore on the prediction of threat
occurrence according to the sequence of
other threats’ temporal relations
occurrence pattern.
 dengue outbreak prediction
 not considered temporal relations
RELATED WORKS
62017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO
 This paper
 features + temporal relations pattern mining + temporal aggregation
technique
 monitoring of the direction of the frequency of attacks (either steady,
increasing or decreasing, within the s values)
 the rising rainfall during northeast monsoon precede a decreasing trend in
dengue cases
 Questions
 What are contributing factors on the prediction model performance for
dengue cases and network severity threat?
 How to utilize temporal relations mining for the prediction of dengue
cases and network severity threat?
 What is the performance of the machine learning models in the
prediction of dengue case and network severity threat?
CONTRIBUTIONS
7
temporal abstractions (Shahar, 1997) and temporal logic
(Allen & Ferguson, 1994) to define patterns able to
describe temporal interactions among multiple time
series.
prediction tasks utilize trend-based features and
complex temporal patterns (ie., behaviour of ‘before’,
‘co-occurs’ which are among the relations in Allen’s
temporal logic)
2017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO
 Let D = {< xi , yi>} be a dataset such that xi ∈ X is the event records for object i up to
time ti , and yi ∈ Y is a numerical value (or class label) associated with attributes that
relate to the event at time ti .
 Learn a function f: X → Y that can predict accurately the value (or class label) for
future event.
 Space transformation ψ: X → X′ that maps each instance of the event xi to a fixed-size
feature vector that preserves the predictive temporal characteristics of xi as much as
possible.
 Object i, Oi is represented by a series of instances sorted according to the state
sequence.
 State is an abstraction for a specific attribute. For example, state E: Ai = D
represents a decreasing trend in the values of temporal variable Ai.
 State interval is a state that holds during an interval, (E, bi , ei ) is a realization of
state E in a data instance and has specific start time (bi) and end time (ei).
 A state sequence is a series of state intervals, where the state intervals are ordered
according to their start times.
 After abstracting all temporal variables, we represent every instance (i.e., dengue
case, attack log) in the database D as a state sequence.As a result, D can be viewed
as a set of state sequences.
82017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO
Step 1
• Sort the data according to sequence of occurrence.
• Define the temporal window size, s based on the temporal interval length, l in the
dataset such as hourly, daily, weekly, monthly and annually.
Step 2
• Determine aggregation operator, op such as summation or frequency of occurrence.
• For example, in the cyber threat dataset we may be interested to know the total
occurrence of each threat according to severity level, or threat category.
Step 3
• Determine bi and ei for s as part of the window sliding process.
• Data recorded within the window size would be aggregated according to suitable
operator.
Step 4
• Time series variables are converted into representation of temporal trend according to
a higher level description using temporal abstractions.
92017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO
 Root Mean Squared Error (RMSE)
 Multilayer Perceptron (MLP)
 support vector machine for regression (SMOReg)
 For the MLP, the learning rate is 0.3 and momentum is 0.2.
 For SMOReg, the Polynomial kernel is used where K(x, y) = <x, y>^p or K(x, y) =
(<x, y>+1)^p and exponent=1.0, while the RegSMOImproved is used as regular
optimizer with epsilon=1.0E-12, epsilon parameter of the epsilon insensitive
loss function=0.01, seed=1, tolerance=0.001 and variant=variant1 is used.
 Data normalized [0,1].
102017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO
DENGUE
OUTBREAK
PREDICTION
• Dataset by the Ministry of
Health (MoH), Malaysia
• predict NL and NC. Records of
dengue cases in 2010 and
2011 for Hulu Langat, which is
in Selangor, Malaysia
• l=week, s=1
112017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO
NETWORK THREAT
PREDICTION
• Warned against the threat
occurrence prior to receiving
the attack
• IPS
• labels severity level (i.e.,
Low, Info, and High).
• block, permit or allow
access not according to
the severity level.
Setting10=Setting1
Setting11=Setting2
Setting12=Setting3
Setting13=Setting5
Setting14=Setting7
op : frequency of the threat cases, l=hourly
s=1: previously 1 hour of the same day,
s=2: previously 2 hour of the same day,
s=11: at the same hour, the day before
s=12: at the same hour, two days before.
TH: threat with severity=High,
TI: threat with severity=Info
TL: threat with severity=Low.
12
 Temporal data mining
 intuitive method for correlations or sequential patterns in sets of
data stream.
 The results of the experiments
 window sliding size influence the performance of the model
 specific configuration a.t. dataset domain.
 various training and testing ratio
 more portions for the training help model to perform more reliably.
 Future works
 Performance of the approach to more comprehensive dengue and
intrusion logs.
 study the sequential relations series
 multiple events temporality such as the combination of day and hour to
observe whether there exist a similar pattern of occurrence between a
specific hour across several days
 multiple event order such as overlapping, start-after and end-by, and so
on.
132017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO
 Deep appreciation to ACDT17 committee for inviting and
sponsoring my presentation and participation in this conference
 Dr. Maslina Zolkepli for the discussion on possibility to develop
prediction model for water quality preservation.
 Another note of appreciation goes to Mrs. Norhashimah Mat
Sejani and Dr. MohdTaufikAbdullah for supporting with the
intrusion data preparation and discussion.
 A special thank is dedicated to the cyber threat profiling research
team for various meetups which has inspired the approach
introduced in this paper.
142017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO

Temporal Relations Mining Approach to Improve Dengue Outbreak and Intrusion Threats Severity Prediction Accuracy

  • 1.
    Temporal Relations Mining Approachto Improve DengueOutbreak and IntrusionThreatsSeverity PredictionAccuracy Nurfadhlina Mohd Sharef, Nor Azura Husin, Khairul Azhar Kasmiran, Mohd Izuan Ninggal Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, UPM Serdang, Selangor, Malaysia {nurfadhlina,n_azura,k_azhar, mohdizuan}@upm.edu.my 12017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO
  • 2.
     Temporal relationsand abstractions  order instances chronologically and with periodic patterns  predict movement (increase or decrease) or quantify the possibility of the predicted event to happen  Identify whether there exist a series of combination of supporting events’ occurrence sequence prior to the occurrence of the target.  focus on values of the time series attribute  Not explored for dengue or intrusion prediction 22017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO
  • 3.
     Intrusion prediction Time series for intrusion prediction  binary based forecasting (i.e., increase or decrease number of attacks),  numerical based prediction  anomaly detection.  Machine learning for threat prediction  factors such as causal networks, attacker IP, and patch levels.  Vector control  physical-based activities to eliminate the mosquitoes breeding spots  Not based on temporal pattern and impact number of recent cases 32017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO
  • 4.
  • 5.
  • 6.
     Previously  datawithin different time series  threat factor profiling  model to predict the ATL and compute the OTL.  sources integration, fuzzifying the threat severity, computing asset (ATL) and organisational (OTL) based threat level  Temporal data are hourly, daily, weekly, and so on  not explore on the prediction of threat occurrence according to the sequence of other threats’ temporal relations occurrence pattern.  dengue outbreak prediction  not considered temporal relations RELATED WORKS 62017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO
  • 7.
     This paper features + temporal relations pattern mining + temporal aggregation technique  monitoring of the direction of the frequency of attacks (either steady, increasing or decreasing, within the s values)  the rising rainfall during northeast monsoon precede a decreasing trend in dengue cases  Questions  What are contributing factors on the prediction model performance for dengue cases and network severity threat?  How to utilize temporal relations mining for the prediction of dengue cases and network severity threat?  What is the performance of the machine learning models in the prediction of dengue case and network severity threat? CONTRIBUTIONS 7 temporal abstractions (Shahar, 1997) and temporal logic (Allen & Ferguson, 1994) to define patterns able to describe temporal interactions among multiple time series. prediction tasks utilize trend-based features and complex temporal patterns (ie., behaviour of ‘before’, ‘co-occurs’ which are among the relations in Allen’s temporal logic) 2017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO
  • 8.
     Let D= {< xi , yi>} be a dataset such that xi ∈ X is the event records for object i up to time ti , and yi ∈ Y is a numerical value (or class label) associated with attributes that relate to the event at time ti .  Learn a function f: X → Y that can predict accurately the value (or class label) for future event.  Space transformation ψ: X → X′ that maps each instance of the event xi to a fixed-size feature vector that preserves the predictive temporal characteristics of xi as much as possible.  Object i, Oi is represented by a series of instances sorted according to the state sequence.  State is an abstraction for a specific attribute. For example, state E: Ai = D represents a decreasing trend in the values of temporal variable Ai.  State interval is a state that holds during an interval, (E, bi , ei ) is a realization of state E in a data instance and has specific start time (bi) and end time (ei).  A state sequence is a series of state intervals, where the state intervals are ordered according to their start times.  After abstracting all temporal variables, we represent every instance (i.e., dengue case, attack log) in the database D as a state sequence.As a result, D can be viewed as a set of state sequences. 82017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO
  • 9.
    Step 1 • Sortthe data according to sequence of occurrence. • Define the temporal window size, s based on the temporal interval length, l in the dataset such as hourly, daily, weekly, monthly and annually. Step 2 • Determine aggregation operator, op such as summation or frequency of occurrence. • For example, in the cyber threat dataset we may be interested to know the total occurrence of each threat according to severity level, or threat category. Step 3 • Determine bi and ei for s as part of the window sliding process. • Data recorded within the window size would be aggregated according to suitable operator. Step 4 • Time series variables are converted into representation of temporal trend according to a higher level description using temporal abstractions. 92017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO
  • 10.
     Root MeanSquared Error (RMSE)  Multilayer Perceptron (MLP)  support vector machine for regression (SMOReg)  For the MLP, the learning rate is 0.3 and momentum is 0.2.  For SMOReg, the Polynomial kernel is used where K(x, y) = <x, y>^p or K(x, y) = (<x, y>+1)^p and exponent=1.0, while the RegSMOImproved is used as regular optimizer with epsilon=1.0E-12, epsilon parameter of the epsilon insensitive loss function=0.01, seed=1, tolerance=0.001 and variant=variant1 is used.  Data normalized [0,1]. 102017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO
  • 11.
    DENGUE OUTBREAK PREDICTION • Dataset bythe Ministry of Health (MoH), Malaysia • predict NL and NC. Records of dengue cases in 2010 and 2011 for Hulu Langat, which is in Selangor, Malaysia • l=week, s=1 112017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO
  • 12.
    NETWORK THREAT PREDICTION • Warnedagainst the threat occurrence prior to receiving the attack • IPS • labels severity level (i.e., Low, Info, and High). • block, permit or allow access not according to the severity level. Setting10=Setting1 Setting11=Setting2 Setting12=Setting3 Setting13=Setting5 Setting14=Setting7 op : frequency of the threat cases, l=hourly s=1: previously 1 hour of the same day, s=2: previously 2 hour of the same day, s=11: at the same hour, the day before s=12: at the same hour, two days before. TH: threat with severity=High, TI: threat with severity=Info TL: threat with severity=Low. 12
  • 13.
     Temporal datamining  intuitive method for correlations or sequential patterns in sets of data stream.  The results of the experiments  window sliding size influence the performance of the model  specific configuration a.t. dataset domain.  various training and testing ratio  more portions for the training help model to perform more reliably.  Future works  Performance of the approach to more comprehensive dengue and intrusion logs.  study the sequential relations series  multiple events temporality such as the combination of day and hour to observe whether there exist a similar pattern of occurrence between a specific hour across several days  multiple event order such as overlapping, start-after and end-by, and so on. 132017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO
  • 14.
     Deep appreciationto ACDT17 committee for inviting and sponsoring my presentation and participation in this conference  Dr. Maslina Zolkepli for the discussion on possibility to develop prediction model for water quality preservation.  Another note of appreciation goes to Mrs. Norhashimah Mat Sejani and Dr. MohdTaufikAbdullah for supporting with the intrusion data preparation and discussion.  A special thank is dedicated to the cyber threat profiling research team for various meetups which has inspired the approach introduced in this paper. 142017 FOURTHASIAN CONFERENCEON DEFENSETECHNOLOGY,TOKYO