Introduction to e tapr for hai con -eng

Enhanced TaPR (eTaPR)
Accuracy Metric for Anomaly Detection
on Time-Series Data
2021. 8. 12.
Won-Seok Hwang
hws23@nsr.re.kr

Accuracy Evaluation on Non Time-Series Data
• Evaluation setting
– Learning a detection method with training dataset
– Detecting “anomalies” from test dataset
• Many anomalies included in the test dataset
• Detection method generates “predictions” that point out the anomalies
• Accuracy of detection
– Portion of detected anomalies to the whole anomalies (i.e., recall)
– Portion of correct predictions to the whole predictions (i.e., precision)
2021-08-12 2

Necessity of Accuracy Metric for Time-Series Data
• For non time-series data (e.g., binary classification or information retrieval)
– An anomaly or a prediction is always evaluated as two cases only
• An anomaly can be (1) detected or (2) not
• A prediction can be (1) correct or (2) not
• For time-series data
– Only a part of an anomaly can be detected
– Only a part of a prediction can be correct
– Because an anomaly or a prediction is represented as a range in time-series data
2021-08-12 3

Characteristic of Anomaly in Time-Series Data
• Reason why an anomaly is a range in time-series data
– An anomalous event (e.g., an incident or a fraud) causes a series of values whose pattern are similar
– It is more reasonable to regard the above-mentioned series of values as a single anomaly
• Reason why a prediction is a range
– A human operator recognizes that a series of predictions as a single prediction that indicates a range
2021-08-12 4
Time
1 9 9 9 9 9 …
An intrusion event (anomaly)
2 1 2 1 2 1 2 1 2 1
An observed value at 𝑡1
Regarding the range (𝑡7 - 𝑡11) as an anomaly
𝑡1 𝑡7 𝑡11

Evaluation by Comparing Ranges (Idea 1)
• Case of detecting a part of an anomaly
– Evaluating how much each anomaly is likely to be detected (Idea 1)
• If a person understands more than a certain portion of an anomaly, s/he can find its whole range
– Because the operator tries to find an anomaly by analyzing a given prediction
– Anomalies 𝑎2 and 𝑎3 are likely to be detected in the below figure.
– Giving non-zero score to those anomalies whose more than a certain portion is detected
• Given parameter (𝜃𝑟) determines the above-mentioned portion
• As an operator understands more portion of an anomaly, s/he is more likely to detect its whole range
– 𝑎3 is detected more easily than 𝑎2
– Giving the anomaly a score proportional to its detected portion
2021-08-12 5
𝑎1 𝑎2 𝑎3
𝑝1 𝑝2 𝑝3
A prediction range
An anomaly range Time
Hard to be detect Likely to be detect More likely to be detect

• Case of a part of prediction is correct
– Evaluating how much each prediction is likely to be useful for the detection (Idea 2)
• A prediction that identifies more than a certain portion of anomalies is useful for a person
– A person would analyze the whole range of a prediction although its some part incorrectly identifies
normal range
– 𝑝2 and 𝑝3 is useful to detect anomalies in the below figure
– Giving non-zero scores to those predictions whose a certain portion correctly identifies anomalies
• Given parameter (𝜃𝑝) determines the above-mentioned portion
• As a prediction identifies more portion of an anomaly, the prediction is more useful for the detection
– 𝑝3 is more useful than 𝑝2
– Giving the prediction a score proportional to its portion identifying anomalies correctly
2021-08-12 6
𝑎1 𝑎2 𝑎3
𝑝1 𝑝2 𝑝3
A prediction range
Time
An anomaly range
Useless to detect Useful to detect More useful to be detect

• Evaluation on the detection failure case
– Only the detection success cases should get non-zero score
– Considering Ideas 1 and 2, the detection failure cases also get non-zero score
• A prediction is evaluated as being useful even though it identifies no anomaly (see 𝑝1 and 𝑎1)
• An anomaly is evaluated as being detected even though no prediction identifies it (see 𝑝2 and 𝑎2)
• Success of detection depends on both of predictions and anomalies (Idea 3)
– When anomalies and predictions are not range, this idea is of no use to consider
• If a prediction identifies an anomaly, of course, there is always one detected anomaly
2021-08-12 7
𝑎1 𝑎2
𝑝1 𝑝2
Time
𝑝1 detects no anomalies because it identifies too small portion
(not enough information) of 𝑎1 to understand 𝑎1.
𝑝1 seems to be useful when considering Idea 2 only
Most portion of 𝑝2 fails to identify any anomalies,
so it is very hard to detect 𝑎2 with 𝑝2.
𝑎2 seems to be detected when consider Idea 1 only

• A lengthy incorrect prediction penalizes more than a short incorrect one (Idea 4)
– A person has to spend time proportional to the prediction to check anomalies occurrence
• A lengthy incorrect prediction requires more personal effort
• On the other hand, we do not consider the length of anomalies
– For instance, a length of cyber attack is unrelated with its effect
2021-08-12 8

Proposed Accuracy Metric
• Enhanced Time-series aware Recall (eTaR)
– Average possibility that all anomalies in the test dataset are detected
– Based on Ideas 1 and 3
• Enhanced Time-series aware Precision (eTaP)
– Average usefulness of all prediction produced by a detection method
– Based on Ideas 2, 3, and 4
• eTaF1
– An harmonic average of eTaP and eTaR
– Your rank is determined by eTaF1!!!
2021-08-12 9

• To understand Ideas 1 and 2, see the paper bellows:
– W.-Hwang et al. “Time-Series Aware Precision and Recall for Anomaly Detection: Considering Variety of
Detection Result and Addressing Ambiguous Labeling,” In Proc. of CIKM, pp. 2241-2244, 2019.
• eTaPR is an enhanced version by employing Ideas 3 and 4
2021-08-12 10
Reference

How to use
• Installation
– Command: python -m pip install eTaPR-[version]-py3-none-any.whl
• Execution
– TaPR_pkg.etapr.evaluate_haicon(anomalies: list, predictions: list) -> dict
• anomalies
– A list including 0 or 1
– 0 indicates normal while 1 does anomaly
• predictions
– A list including 0 or 1
– 0 indicates that your prediction is normal while 1 that your prediction is anomaly
• Returned dictionary including ‘tar’, ‘tap’, and ‘f1’
– e.g.:
result = TaPR_pkg.etapr.evaluate_haicon(anomalies_list, predictions_list)
result[‘tar’], result[‘tap’], result[‘f1’]
2021-08-12 11

• Precision and recall are the most well-known accuracy metrics
• They fail to evaluate the variety of detected anomalies
– Method 2 gets higher score than Method 1 even though it detects only 𝑎1
2021-08-12 12
Appendix: Why We Do Not Consider Precision and Recall
Method
Metric
Precision Recall
1 0.67 0.40
2 1.00 0.67

Introduction to e tapr for hai con -eng

Recommended

Recommended

More Related Content

Similar to Introduction to e tapr for hai con -eng

Similar to Introduction to e tapr for hai con -eng (20)

More from DACON AI 데이콘

More from DACON AI 데이콘 (20)

Recently uploaded

Recently uploaded (20)

Introduction to e tapr for hai con -eng