A Multiple Query Optimization Scheme
for Change Point Detection
[POSITION PAPER]
Masahiro Oke, Hideyuki Kawashima
University of Tsukuba, Japan
Outline
• Background
• In DSMS analytics (philosophy & system)
• CPD (Change Point Detection )
• Proposal: MQO for multiple CPDs
• Experiment
• Summary
SELECT COUNT(*)
FROM eth0[TIME 1 MIN]
WHERE port = 80
How many packets are
arrived for port 80
in a minute ?
SPS
Relation
eth0
・Destination IP
・Source IP
・Destination Port
・Source Port
・Interface (e.g. eth0)
・Length
・Version (e.g. IPV4 )
・Payload
Relational schema
20
Quick Review
Data Stream Management System (DSMS)
Q1
• SQL is translated to operator tree.
• On arrival of data, tree is evaluated.
• Operators are based on relational database
– w(Window): Cutting off relations from a
stream
– σ (Selection): Filter
– α (Aggregation): such as AVG, MIN, MAX
Query
Result
Users/Apps.
w σ αInput
adapter
Output
adapter
SPS
Data
SELECT COUNT(*)
FROM eth0[TIME 1 MIN]
WHERE port = 80
Our Target Application: Malware Detection
• Real datasets
– Real trace logs of malware activities
• NICTER
– Keeps about 160,000 unused ip addresses (DARK NET)
• Packets to dark net are considered as attacks.
– Uses CPD (Change Point Detection) [1]) to detect
attacks such as DoS (denial of services).
[1] Daisuke Inoue, K. Yoshioka, M. Eto, Masaya Yamagata, Eisuke Nishino, Jun-ichi Takeuchi,
Kazuya Ohkouchi, Koji Nakao: An Incident Analysis System NICTER and Its Analysis Engines Based
on Data Mining Techniques. ICONIP (1) 2008: 579-586
[2] J. Takeuchi and K. Yamanishi, “A Unifying Framework for Detecting Outliers and Change
Points from Time Series,” IEEE TKDE, pp.482-492, 2006.
Outline
• Background
• In DSMS analytics
• CPD (Change Point Detection )
• Proposal: MQO for multiple CPDs
• Experiment
• Summary
Relational data processing
Attack Detection
Discussion
?• Aggregates are good
CPD(AR)/ LOF / LDA/FIM
Yet Another DSMS: Falcon
Example Query on Falcon (1/2)
• #Access for each port ? [1]
• Group by aggregates
SELECT dst_port,
COUNT(dst_port)
FROM pkt[1 sec]
GROUP BY dst_port
g-pkt
src_ip
dst_ip
src_port
dst_port
seq_no
packet_size
timestamp
protocol
ack
fin
syn
urg
push
reset
content
22: 2
80: 2
15: 1
22
N
I
C
80 15 80 22
1 second
[1] “Enabling Real Time Data Analysis”, Divesh
Srivastava (AT&T Labs), et, al. Keynote talk, VLDB
2010. (a similar query is found in pp.15 of talk slide)
Example Query on Falcon (2/2)
• Access on each port ? [2]
• Outlier score for each port/sec
select dst_port,
cpd(dst_port)
from pkt[1 sec]
group by dst_port
g-cpd-pkt
src_ip
dst_ip
src_port
dst_port
seq_no
packet_size
timestamp
protocol
ack
fin
syn
urg
push
reset
content
22: 1.33
80: 2.44
15: 1.22
22
N
I
C
80 15 80 22
1 second
[2] “An Incident Analysis System NICTER and Its
Analysis Engines Based on Data Mining Techniques”,
Daisuke Inoue (NICT), et, al. ICONIP (1) 2008: 579-
586
Outline
• Background
• In DSMS analytics
• CPD (Change Point Detection )
• Proposal: MQO for multiple CPDs
• Experiment
• Summary
Time
Change Point Detection(CPD)
• Outlier detection technique over time series data
– 2 stage learning based on autoregressive (AR) model
– Apps: traffic analysis, stock price analysis
Apply CPD !
[1] Jun-ichi Takeuchi and Kenji Yamanishi, “A Unifying Framework for Detecting Outliers and
Change Points from Time Series,” IEEE transactions on Knowledge and Data Engineering, pp.482-
492, 2006.
11
Dividing CPD into 4 operators
Compute outiler score and
Moving average score
(omitting shwoing outlier score)
1st stage learning
Compute outiler score and
Moving average score
Input tx
2nd stage learning
Outlier scoreMoving average score
Probability provided by
2nd stage learning
Compute outiler score and
Moving average ascore
Input time series
data
Probability provided by
1st stage learning
Problem of CPD: Parameter setting
Using appropriate parameter set Using inappropriate parameter set
Parameterset
2
A simple way for parameter tuning:
---Multiple CPDs with different parameter sets---
Input packet
Compute outiler score
1st stage learning
Compute outiler score
2nd stage learning
Compute outiler score
1st stage learning
Compute outiler score
2nd stage learning
Result aggregation
(e.g. majority voting)
Parameterset
3
Parameterset
4
Parameterset
0k
Outline
• Background
• In DSMS analytics
• CPD (Change Point Detection )
• Proposal: MQO for multiple CPDs
• Experiment
• Summary
Q: When can we share operators ? (1/2)
Preparation: 6 (=3+3) parameters
Input packet
Compute outiler score
1st stage learning
2nd stage learning
Compute outiler score
Q: When can we share operators ? (2/2)
-- Branch or merge --
Compute
outiler
score
1st stage
learning
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
1st stage
learning
Compute
outiler
score
Compute
outiler
score
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
Compute
outiler
score
1st stage
learning
Branch only Branch & merge Merge only
1st stage
learning
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
2nd stage
learning
1st stage
learning
1st stage
learning
Both parameters (α..) and input values (arc) must be the same.
Merging is NOT allowed on this scheme since
different parents may produce different output values.
The 4 sharing patterns
-- Only branch cases, not merge --
Compute
outiler
score
1st stage
learning
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
1st stage
learning
2nd stage
learning
Compute
outiler
score
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
1st stage
learning
Compute
outiler
score
Compute
outiler
score
2nd stage
learning
Compute
outiler
score
1st stage
learning
Compute
outiler
score
Compute
outiler
score
2nd stage
learning
NOTE: “1st stage learning” and “3rd stage learning” can be divided to sub operators, and a part of
sub operators can also be shared. The sharing patterns are described in the paper.
Pattern 1: Sharing CPD-1 if α_R and α_K are the same.
Pattern 2: Sharing CPD-1, 2 if α_R, α_K and α_T are the same.
Pattern 3: Sharing CPD-1, 2, 3 if α_R, α_K, α_T, β_R and β_K are the same.
Pattern 4: Sharing CPD-1, 2, 3, 4 if α_R, α_K, α_T, β_R, β_K and β_T are the same.
Pattern 1 Pattern 2 Pattern 3 Pattern 4
Outline
• Background
• In DSMS analytics
• CPD (Change Point Detection )
• MQO for multiple CPDs
• Experiment
• Summary
Experiment
1. Measuring reduction ratio provided by
MQO
– 3 kinds of parameter sets
• Grid style
• Random
• Uniform (just to see ideal case)
– Implement a system to measure the
reduction ratio.
– Measured the reduction ratio.
2. Measuring execution time when
sharing 1st stage learning
– Implement CPD by C++ and eigen library
(for matrix manipulation).
– Measured execution time using the CPD.
Grid style
Reduction ratio provided by sharing
Result
Parameter
Pattern
#
Queries
Naïve
(#
Operators)
Sharing
(#
Operators)
Performance
Gain
(Reduction
Ratio)
Uniform 64 384 6 98.4 %
Random (2 values) 64 384 101 73.7 %
Random (10
values)
64 384 315 18.0 %
Random (100
values)
64 384 366 4.7 %
Grid Style (N = 2) 64 384 126 67.2 %
Grid Style (N = 4) 4096 24576 5460 77.7 %
Grid Style (N = 8) 262144 1572864 299592 80.1 %
Sharing is effective for grid style, while ineffective for random style.
Taking grid style for CPD and random sampling the result for
aggregation is effective.
Experiment
1. Measuring reduction ratio provided by
MQO
– Change parameter sets
• Grid style
• Random
• Uniform (just to see ideal case)
– Implement a system to measure the
reduction ratio.
– Measured the reduction ratio.
2. Measuring execution time when
sharing ONLY 1st stage learning
– Implement CPD by C++ and eigen library
(for matrix manipulation).
– Measured execution time using the CPD.
Grid style
Execution time when sharing 1st stage learning
Result
ID
Parameters
Execution Time
(second)
Performance
Gain (times)
Naive
Shared
CPD-1
Shared
CPD-1
1 .02 2 5 .02 3 5 2.92 1.77 1.65
2 .02 4 5 .02 3 5 3.65 1.77 2.06
3 .02 2 5 .02 4 5 3.29 2.17 1.52
4 .005 2 5 .02 4 5 2.91 1.76 1.65
5 .02 2 5 .005 3 5 2.89 1.77 1.64
6 .02 2 7 .02 7 5 3.00 1.87 1.60
7 .02 1 5 .02 1 5 1.96 1.11 1.78
8 .02 10 10 .02 10 10 11.2 5.84 1.92
9 .02 1 10 .02 10 10 6.66 5.80 1.15
10 .02 10 10 .02 1 10 6.68 1.34 5.00
Outline
• Background
• In DSMS analytics
• CPD (Change Point Detection )
• MQO for multiple CPDs
• Experiment
• Summary
Related Work
• Related work
– Philosophy
• In-DB analytics (MADlib, Bismarck, Oracle R Enterprise)
– Acceleration issues
• Advanced hardware (GPGPU, FPGA, Xeon Phi)
• Combination is a promising way
Conclusions
• Multiple query optimization for CPD
– 4 sharing patterns
• Experimental result
– 5 times faster than naïve at the maximum case
• Future work
– Integrating MQO and accelerator

BIRTE-13-Kawashima

  • 1.
    A Multiple QueryOptimization Scheme for Change Point Detection [POSITION PAPER] Masahiro Oke, Hideyuki Kawashima University of Tsukuba, Japan
  • 2.
    Outline • Background • InDSMS analytics (philosophy & system) • CPD (Change Point Detection ) • Proposal: MQO for multiple CPDs • Experiment • Summary
  • 3.
    SELECT COUNT(*) FROM eth0[TIME1 MIN] WHERE port = 80 How many packets are arrived for port 80 in a minute ? SPS Relation eth0 ・Destination IP ・Source IP ・Destination Port ・Source Port ・Interface (e.g. eth0) ・Length ・Version (e.g. IPV4 ) ・Payload Relational schema 20 Quick Review Data Stream Management System (DSMS) Q1
  • 4.
    • SQL istranslated to operator tree. • On arrival of data, tree is evaluated. • Operators are based on relational database – w(Window): Cutting off relations from a stream – σ (Selection): Filter – α (Aggregation): such as AVG, MIN, MAX Query Result Users/Apps. w σ αInput adapter Output adapter SPS Data SELECT COUNT(*) FROM eth0[TIME 1 MIN] WHERE port = 80
  • 5.
    Our Target Application:Malware Detection • Real datasets – Real trace logs of malware activities • NICTER – Keeps about 160,000 unused ip addresses (DARK NET) • Packets to dark net are considered as attacks. – Uses CPD (Change Point Detection) [1]) to detect attacks such as DoS (denial of services). [1] Daisuke Inoue, K. Yoshioka, M. Eto, Masaya Yamagata, Eisuke Nishino, Jun-ichi Takeuchi, Kazuya Ohkouchi, Koji Nakao: An Incident Analysis System NICTER and Its Analysis Engines Based on Data Mining Techniques. ICONIP (1) 2008: 579-586 [2] J. Takeuchi and K. Yamanishi, “A Unifying Framework for Detecting Outliers and Change Points from Time Series,” IEEE TKDE, pp.482-492, 2006.
  • 6.
    Outline • Background • InDSMS analytics • CPD (Change Point Detection ) • Proposal: MQO for multiple CPDs • Experiment • Summary
  • 7.
    Relational data processing AttackDetection Discussion ?• Aggregates are good CPD(AR)/ LOF / LDA/FIM Yet Another DSMS: Falcon
  • 8.
    Example Query onFalcon (1/2) • #Access for each port ? [1] • Group by aggregates SELECT dst_port, COUNT(dst_port) FROM pkt[1 sec] GROUP BY dst_port g-pkt src_ip dst_ip src_port dst_port seq_no packet_size timestamp protocol ack fin syn urg push reset content 22: 2 80: 2 15: 1 22 N I C 80 15 80 22 1 second [1] “Enabling Real Time Data Analysis”, Divesh Srivastava (AT&T Labs), et, al. Keynote talk, VLDB 2010. (a similar query is found in pp.15 of talk slide)
  • 9.
    Example Query onFalcon (2/2) • Access on each port ? [2] • Outlier score for each port/sec select dst_port, cpd(dst_port) from pkt[1 sec] group by dst_port g-cpd-pkt src_ip dst_ip src_port dst_port seq_no packet_size timestamp protocol ack fin syn urg push reset content 22: 1.33 80: 2.44 15: 1.22 22 N I C 80 15 80 22 1 second [2] “An Incident Analysis System NICTER and Its Analysis Engines Based on Data Mining Techniques”, Daisuke Inoue (NICT), et, al. ICONIP (1) 2008: 579- 586
  • 10.
    Outline • Background • InDSMS analytics • CPD (Change Point Detection ) • Proposal: MQO for multiple CPDs • Experiment • Summary
  • 11.
    Time Change Point Detection(CPD) •Outlier detection technique over time series data – 2 stage learning based on autoregressive (AR) model – Apps: traffic analysis, stock price analysis Apply CPD ! [1] Jun-ichi Takeuchi and Kenji Yamanishi, “A Unifying Framework for Detecting Outliers and Change Points from Time Series,” IEEE transactions on Knowledge and Data Engineering, pp.482- 492, 2006. 11
  • 12.
    Dividing CPD into4 operators Compute outiler score and Moving average score (omitting shwoing outlier score) 1st stage learning Compute outiler score and Moving average score Input tx 2nd stage learning Outlier scoreMoving average score Probability provided by 2nd stage learning Compute outiler score and Moving average ascore Input time series data Probability provided by 1st stage learning
  • 13.
    Problem of CPD:Parameter setting Using appropriate parameter set Using inappropriate parameter set
  • 14.
    Parameterset 2 A simple wayfor parameter tuning: ---Multiple CPDs with different parameter sets--- Input packet Compute outiler score 1st stage learning Compute outiler score 2nd stage learning Compute outiler score 1st stage learning Compute outiler score 2nd stage learning Result aggregation (e.g. majority voting) Parameterset 3 Parameterset 4 Parameterset 0k
  • 15.
    Outline • Background • InDSMS analytics • CPD (Change Point Detection ) • Proposal: MQO for multiple CPDs • Experiment • Summary
  • 16.
    Q: When canwe share operators ? (1/2) Preparation: 6 (=3+3) parameters Input packet Compute outiler score 1st stage learning 2nd stage learning Compute outiler score
  • 17.
    Q: When canwe share operators ? (2/2) -- Branch or merge -- Compute outiler score 1st stage learning Compute outiler score 2nd stage learning Compute outiler score Compute outiler score 2nd stage learning Compute outiler score Compute outiler score 2nd stage learning Compute outiler score 1st stage learning Compute outiler score Compute outiler score Compute outiler score 2nd stage learning Compute outiler score Compute outiler score 1st stage learning Branch only Branch & merge Merge only 1st stage learning Compute outiler score 2nd stage learning Compute outiler score Compute outiler score 2nd stage learning Compute outiler score 2nd stage learning 1st stage learning 1st stage learning Both parameters (α..) and input values (arc) must be the same. Merging is NOT allowed on this scheme since different parents may produce different output values.
  • 18.
    The 4 sharingpatterns -- Only branch cases, not merge -- Compute outiler score 1st stage learning Compute outiler score 2nd stage learning Compute outiler score Compute outiler score 2nd stage learning Compute outiler score Compute outiler score 2nd stage learning Compute outiler score 1st stage learning 2nd stage learning Compute outiler score Compute outiler score 2nd stage learning Compute outiler score 2nd stage learning Compute outiler score 1st stage learning Compute outiler score Compute outiler score 2nd stage learning Compute outiler score 1st stage learning Compute outiler score Compute outiler score 2nd stage learning NOTE: “1st stage learning” and “3rd stage learning” can be divided to sub operators, and a part of sub operators can also be shared. The sharing patterns are described in the paper. Pattern 1: Sharing CPD-1 if α_R and α_K are the same. Pattern 2: Sharing CPD-1, 2 if α_R, α_K and α_T are the same. Pattern 3: Sharing CPD-1, 2, 3 if α_R, α_K, α_T, β_R and β_K are the same. Pattern 4: Sharing CPD-1, 2, 3, 4 if α_R, α_K, α_T, β_R, β_K and β_T are the same. Pattern 1 Pattern 2 Pattern 3 Pattern 4
  • 19.
    Outline • Background • InDSMS analytics • CPD (Change Point Detection ) • MQO for multiple CPDs • Experiment • Summary
  • 20.
    Experiment 1. Measuring reductionratio provided by MQO – 3 kinds of parameter sets • Grid style • Random • Uniform (just to see ideal case) – Implement a system to measure the reduction ratio. – Measured the reduction ratio. 2. Measuring execution time when sharing 1st stage learning – Implement CPD by C++ and eigen library (for matrix manipulation). – Measured execution time using the CPD. Grid style
  • 21.
    Reduction ratio providedby sharing Result Parameter Pattern # Queries Naïve (# Operators) Sharing (# Operators) Performance Gain (Reduction Ratio) Uniform 64 384 6 98.4 % Random (2 values) 64 384 101 73.7 % Random (10 values) 64 384 315 18.0 % Random (100 values) 64 384 366 4.7 % Grid Style (N = 2) 64 384 126 67.2 % Grid Style (N = 4) 4096 24576 5460 77.7 % Grid Style (N = 8) 262144 1572864 299592 80.1 % Sharing is effective for grid style, while ineffective for random style. Taking grid style for CPD and random sampling the result for aggregation is effective.
  • 22.
    Experiment 1. Measuring reductionratio provided by MQO – Change parameter sets • Grid style • Random • Uniform (just to see ideal case) – Implement a system to measure the reduction ratio. – Measured the reduction ratio. 2. Measuring execution time when sharing ONLY 1st stage learning – Implement CPD by C++ and eigen library (for matrix manipulation). – Measured execution time using the CPD. Grid style
  • 23.
    Execution time whensharing 1st stage learning Result ID Parameters Execution Time (second) Performance Gain (times) Naive Shared CPD-1 Shared CPD-1 1 .02 2 5 .02 3 5 2.92 1.77 1.65 2 .02 4 5 .02 3 5 3.65 1.77 2.06 3 .02 2 5 .02 4 5 3.29 2.17 1.52 4 .005 2 5 .02 4 5 2.91 1.76 1.65 5 .02 2 5 .005 3 5 2.89 1.77 1.64 6 .02 2 7 .02 7 5 3.00 1.87 1.60 7 .02 1 5 .02 1 5 1.96 1.11 1.78 8 .02 10 10 .02 10 10 11.2 5.84 1.92 9 .02 1 10 .02 10 10 6.66 5.80 1.15 10 .02 10 10 .02 1 10 6.68 1.34 5.00
  • 24.
    Outline • Background • InDSMS analytics • CPD (Change Point Detection ) • MQO for multiple CPDs • Experiment • Summary
  • 25.
    Related Work • Relatedwork – Philosophy • In-DB analytics (MADlib, Bismarck, Oracle R Enterprise) – Acceleration issues • Advanced hardware (GPGPU, FPGA, Xeon Phi) • Combination is a promising way
  • 26.
    Conclusions • Multiple queryoptimization for CPD – 4 sharing patterns • Experimental result – 5 times faster than naïve at the maximum case • Future work – Integrating MQO and accelerator