SlideShare a Scribd company logo
1 of 36
Download to read offline
Outlier detection in
audit logs for
application systems
應用程序系統審計軌跡中的異常值檢測
10766012陳遠任
• H.D. Kuna, Argentina
• R.García-Martinez, Argentina
• F.R.Villatoro, Spain
• Information Systems
• Volume 44, August 2014
• Keywords:
• Data mining, Systems audit, Outlier detection
2
1. Introduction
1. Introduction
• Systems auditing is composed of a series of
tasks aimed at ensuring that all information
systems within an organization function
properly and at providing the basis that
enables corporations to fulfill their strategic
objectives.
• 整體性的審計以確保資訊系統能符合企業要求並
提供正確資訊。
4
1. Introduction
• Audit logs contain records of every operation
carried out within a software information
system and play a key role in guaranteeing that
each organization's procedures and
regulations are observed.
• 審計日誌(審計軌跡/工作底稿)紀錄軟體的每
個操作,並作以確保符合相關規則。
5
1. Introduction
• Real databases contain anomalies related to
different causes, including errors in data
collection, errors in the information systems,
probable malicious actions, and so on.
• 異常包括資料輸入錯誤、系統錯誤、惡意操作
等。
6
1. Introduction
• This paper aims to introduce a process that
employs data mining techniques to automate
outlier detection in system audit logs that
include alphanumeric data. Automated
detection can allow an auditor to detect hints of
anomalous activities, which will most likely
require closer scrutiny.
• 本篇介紹相關資料探勘技術,並提供建議。
7
1.1 Related work. data
mining in systems auditing
• Computer-Assisted Auditing Techniques
(CAATs) make it possible to use computers as
part of the auditing process.
• 電腦輔助稽核技術協助我們使用電腦查核。
• 但仍有其他選擇:
8
1.1 Related work. data
mining in systems auditing
• Data analysis
software.
• Network security
assessment software.
• Assessment software
for operating systems
and database
management systems.
• Software and source
code testing tools
• 資料分析軟體
• 網絡安全評估軟體
• 用於操作系統和資料庫
庫管理系統的評估軟體
• 軟體和原始碼測試工具
9
1.1 Related work. data
mining in systems auditing
• Clustering(群集) is a data mining technique that
may be employed for outlier detection.
• Several clustering techniques are available,
including the following:
1. Hierarchical clustering階層式分群
2. Partitioning methods分區群集,讓群內偏差最小
3. Density-based clustering依照密度做分群
10
1.1 Related work. data
mining in systems auditing
• However, procedures have not been formally
established to construct a system auditing tool
from data mining techniques applicable to
alphanumeric fields, which is the goal of this
article. Furthermore, the tool we develop must
not require its user to be an expert in data
mining.
• 資料探勘用於審計的方法論尚未建立,也不應讓
查核人員成為資料探勘專家。
11
3. Materials and
methods
3. Materials and methods
• 1. Detection of outliers with unsupervised learning.
• 2. Detection of outliers with supervised learning.
• 3. Detection of outliers with semisupervised
learning.在分群的過程中,用有標籤過的資料先切
出一條分界線,再利用剩下無標籤資料的整體分
布,調整出兩大類別的新分界。具有非監督式學習
高自動化的優點,又能降低標籤資料的成本。
13
3. Materials and methods
挑選方法時的考慮事項
• The ability of the algorithm to produce results that are comprehensible for
the final user.可理解
• The efficacy in its detection of outliers. 效率
• The false positive rate.誤報率
• The compatibility among the algorithms with the objectives of the
procedure.相互兼容
• The expected improvement of the efficacy by combining several
techniques in comparison to using them separately. 合併或分開的效率
• That the algorithm can operate on alphanumeric data. 處理文數字資料
• That the algorithms did not require a large number of parameters and can
be easily automated. This is very important in cases lacking expert
auditors for data mining. 在沒有大量資料時也可使用
• Finally and importantly, that the algorithms are capable of improving their
capacity to specifically detect outliers. 提高檢測效率
14
3.1.1. Outlier detection
algorithms
• LOF (Local Outlier Factor)
依照資料密度檢測、非監督式
• DBSCAN (Density-Based Spatial Clustering of
Applications with Noise)
依照資料密度檢測、非監督式
• DB-Outliers (Distance-Based Outliers)
依照資料間距離檢測
• COF (Class Outlier Factor)
類似DB-Outliers,但依照類別檢測
15
3.2.1. Selection of outlier
detection specific algorithms
• Testing and selection were based on an
artificial database that was created in
accordance with the guidelines set forth by
several authors.
• 人工創建測試資料庫,並隨機安排異常值
• Table 1 details the characteristics of the test
database, as follows:
16
3.2.1. Selection of outlier
detection specific algorithms
• DB-Outliers and COF performed excellently,
but they were discarded as they required that
the total number of outliers to be detected be
provided beforehand. As this number is
unknowable in real conditions, LOF and
DBSCAN were chosen instead.
17
3.2.1.1. Merging results
from LOF and DBSCAN.
• If “LOF”r1.2,then “LOF_value” is “0”.
• If “LOF”41.2,then “LOF_value” is “1”.
• If the tuple belongs to a cluster other than cluster 0,
“DBSCAN_value” = “0”.
• If the tuple belongs to cluster 0,
“DBSCAN_value” = “1”.
LOF DBSCAN
18
3.2.1.3.Combining LOF and
DBSCAN
• In Table 3, we can see that outlier detection
generated a 4% improvement based on the
use of LOF and a 24% improvement based on
the use of DBSCAN. False positives were
reduced by 1% when compared to LOF and by
4% when compared to DBSCAN.
19
3.2.2.1. Classification
algorithms combination
• C4.5效率佳,並不影響誤報率
kept the efficacy level intact but did not affect the
percentage of false positive results
• Bayesian Network (BN)消除了誤報,但嚴重影響了
異常值檢測效果
eliminated all false positives but drastically
affected outlier detection efficacy
• PART 在自身模型中有不錯的結果
obtain the best possible global results through
their respective models
20
3.2.2.1. Classification
algorithms combination
• If any tuple meets none of these conditions,
then “outlier_type” = clean.
21
3.3. Designing the proposed
process
22
3.3. Designing the proposed
process
A. Read and pre-process the database.
理解並預處理資料
B. Apply LOF. Add “LOF_value” attribute
C. Apply DBSCAN. Add the “DBSCAN_value”
attribute
進行LOF、DBSCAN
D. Merge the results.
Add the “outlier_type” attribute
結合兩種方法
23
3.3. Designing the proposed
process
E. Read the database
F. Apply C4.5
G. Apply BN
H. Apply PART
I. Merge the results.
J. Apply the rules for outlier tuple determination
as per the criteria
K. Save the final results of the “outlier_type”
target attribute for each tuple; this can be
either “clean” or “outlier”.
L. End the procedure 24
4. Experimentation
on real databases
4.1. Academic management
system of a university
• a database from a university student
management system, investigating the audit
records from the “Exam Management”,
“Course Management”, and “Enrollment
Management”
26
4.1. Academic management
system of a university
• A minimum efficacy of 65% and a 1%
maximum for false positives were established
after consultation with the aforementioned
experts, two system administrators who work
on the student management system and have
ample experience in academic management
systems.
• 最低效率65%、誤報率最高1%。
27
4.1. Academic management
system of a university
• The data selected for the test came from the
year 2000, as it was determined by the experts
that most anomalous operations occurred
during that year.
28
4.1. Academic management
system of a university
• To clarify the meaning of outliers in the
academic management system, let us list
some examples of activities that the experts
consider as anomalies:
• 在此案例中的異常值:
• 1. activities in the audit log during either
holidays or outside the shifts of the personnel
假期期間或在輪班之外的活動
29
4.1. Academic management
system of a university
• 2. operations not meeting the profile or
permissions for a given user
不符合權限的操作
• 3. activities going against the internal
regulations defined by the university
違反大學內部規定的活動
• 4. data recorded outside the date established
under the calendar of the university
在大學行事曆以外的記錄
30
4.4. Result
• The efficacy was always over 66%(>65%), with
a mean value of 76%. The number of false
positives in all the cases is smaller than
0.67%(<1%), with a minimum value of 0.10%.
31
4.4. Result
• the classification of the types of outliers
detected by using our procedure applied to the
academic management database.
32
5. Conclusions
5. Conclusions
• Based on these findings, we can conclude that
the data mining-algorithm merged approach
can be considered to be a resounding success,
allowing us to develop a process and to apply it
on audit tables from a real database and thus
to facilitate the system auditor's job.
• 資料探勘方法的合併使用是成功的,允許我們開
發一個流程並將其應用於真實資料庫,進而促進
審計員的工作。
34
5. Conclusions
• We have also contemplated the further
optimization of the process' efficacy and would
also like to reduce the rate of false positives to
even lower levels in our future work.
• 希望未來能將誤報率降低
• We have analyzed the convenience of employing
fuzzy logic, as in many cases a tuple does not
respond to the two values that our process
establishes.
• 我們已經分析了使用模糊理論的便利性,因為在許
多情況下,是不能被效率及誤報率反映出的。
35
THANKS!!

More Related Content

Similar to Outlier detection in audit logs for application systems

Learn Software Testing in 6 Lessons
Learn Software Testing in 6 LessonsLearn Software Testing in 6 Lessons
Learn Software Testing in 6 Lessons
Syed Ahmed
 
WINSEM2021-22_ITE2004_ETH_VL2021220500452_Reference_Material_I_28-02-2022_sta...
WINSEM2021-22_ITE2004_ETH_VL2021220500452_Reference_Material_I_28-02-2022_sta...WINSEM2021-22_ITE2004_ETH_VL2021220500452_Reference_Material_I_28-02-2022_sta...
WINSEM2021-22_ITE2004_ETH_VL2021220500452_Reference_Material_I_28-02-2022_sta...
madhurpatidar2
 
El-Paso SOX TestingTraining- June 2007
El-Paso SOX TestingTraining- June 2007El-Paso SOX TestingTraining- June 2007
El-Paso SOX TestingTraining- June 2007
Danial Khan
 

Similar to Outlier detection in audit logs for application systems (20)

Learn Software Testing in 6 Lessons
Learn Software Testing in 6 LessonsLearn Software Testing in 6 Lessons
Learn Software Testing in 6 Lessons
 
WINSEM2021-22_ITE2004_ETH_VL2021220500452_Reference_Material_I_28-02-2022_sta...
WINSEM2021-22_ITE2004_ETH_VL2021220500452_Reference_Material_I_28-02-2022_sta...WINSEM2021-22_ITE2004_ETH_VL2021220500452_Reference_Material_I_28-02-2022_sta...
WINSEM2021-22_ITE2004_ETH_VL2021220500452_Reference_Material_I_28-02-2022_sta...
 
IT Audit - Evolve and Stay in the Game
IT Audit - Evolve and Stay in the GameIT Audit - Evolve and Stay in the Game
IT Audit - Evolve and Stay in the Game
 
Information system audit
Information system audit Information system audit
Information system audit
 
Information systems audit n control introduction.ppt
Information systems audit n control introduction.pptInformation systems audit n control introduction.ppt
Information systems audit n control introduction.ppt
 
El-Paso SOX TestingTraining- June 2007
El-Paso SOX TestingTraining- June 2007El-Paso SOX TestingTraining- June 2007
El-Paso SOX TestingTraining- June 2007
 
Software Testing Strategies
Software Testing StrategiesSoftware Testing Strategies
Software Testing Strategies
 
21. Government, technologies' audit and information systems
21. Government, technologies' audit and information systems 21. Government, technologies' audit and information systems
21. Government, technologies' audit and information systems
 
Xybion Webinar - Rumors, Risks and Realities of spreadsheet validation
Xybion Webinar - Rumors, Risks and Realities of spreadsheet validationXybion Webinar - Rumors, Risks and Realities of spreadsheet validation
Xybion Webinar - Rumors, Risks and Realities of spreadsheet validation
 
Chromatography Data System: Getting It “Right First Time” Seminar Series – Pa...
Chromatography Data System: Getting It “Right First Time” Seminar Series – Pa...Chromatography Data System: Getting It “Right First Time” Seminar Series – Pa...
Chromatography Data System: Getting It “Right First Time” Seminar Series – Pa...
 
13680904.ppt
13680904.ppt13680904.ppt
13680904.ppt
 
Literature screening for pharmacovigilance 190818
Literature screening for pharmacovigilance 190818Literature screening for pharmacovigilance 190818
Literature screening for pharmacovigilance 190818
 
Concept of glp
Concept of glpConcept of glp
Concept of glp
 
Penentration testing
Penentration testingPenentration testing
Penentration testing
 
Testing fundamentals
Testing fundamentalsTesting fundamentals
Testing fundamentals
 
WQD2011 - Breakthrough Process Improvement - Tawam Hospital - The Surgical Ad...
WQD2011 - Breakthrough Process Improvement - Tawam Hospital - The Surgical Ad...WQD2011 - Breakthrough Process Improvement - Tawam Hospital - The Surgical Ad...
WQD2011 - Breakthrough Process Improvement - Tawam Hospital - The Surgical Ad...
 
Different Types Of Testing
Different Types Of TestingDifferent Types Of Testing
Different Types Of Testing
 
chapter2-190516054412.pdf
chapter2-190516054412.pdfchapter2-190516054412.pdf
chapter2-190516054412.pdf
 
Conducting an Information Systems Audit
Conducting an Information Systems Audit Conducting an Information Systems Audit
Conducting an Information Systems Audit
 
Compliance
ComplianceCompliance
Compliance
 

More from Jason Chen (7)

Optimize0530v 3
Optimize0530v 3Optimize0530v 3
Optimize0530v 3
 
Ai robot
Ai robotAi robot
Ai robot
 
PHILIPPINE JASON 0530
 PHILIPPINE JASON 0530 PHILIPPINE JASON 0530
PHILIPPINE JASON 0530
 
JASON - Data mining
JASON - Data miningJASON - Data mining
JASON - Data mining
 
10766012 ranalitics
10766012 ranalitics10766012 ranalitics
10766012 ranalitics
 
Ml ch17
Ml ch17Ml ch17
Ml ch17
 
Post Big Data 0530
Post Big Data 0530Post Big Data 0530
Post Big Data 0530
 

Recently uploaded

原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
pwgnohujw
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
23050636
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
ju0dztxtn
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
jk0tkvfv
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
a8om7o51
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
ppy8zfkfm
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 

Recently uploaded (20)

Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 

Outlier detection in audit logs for application systems

  • 1. Outlier detection in audit logs for application systems 應用程序系統審計軌跡中的異常值檢測 10766012陳遠任
  • 2. • H.D. Kuna, Argentina • R.García-Martinez, Argentina • F.R.Villatoro, Spain • Information Systems • Volume 44, August 2014 • Keywords: • Data mining, Systems audit, Outlier detection 2
  • 4. 1. Introduction • Systems auditing is composed of a series of tasks aimed at ensuring that all information systems within an organization function properly and at providing the basis that enables corporations to fulfill their strategic objectives. • 整體性的審計以確保資訊系統能符合企業要求並 提供正確資訊。 4
  • 5. 1. Introduction • Audit logs contain records of every operation carried out within a software information system and play a key role in guaranteeing that each organization's procedures and regulations are observed. • 審計日誌(審計軌跡/工作底稿)紀錄軟體的每 個操作,並作以確保符合相關規則。 5
  • 6. 1. Introduction • Real databases contain anomalies related to different causes, including errors in data collection, errors in the information systems, probable malicious actions, and so on. • 異常包括資料輸入錯誤、系統錯誤、惡意操作 等。 6
  • 7. 1. Introduction • This paper aims to introduce a process that employs data mining techniques to automate outlier detection in system audit logs that include alphanumeric data. Automated detection can allow an auditor to detect hints of anomalous activities, which will most likely require closer scrutiny. • 本篇介紹相關資料探勘技術,並提供建議。 7
  • 8. 1.1 Related work. data mining in systems auditing • Computer-Assisted Auditing Techniques (CAATs) make it possible to use computers as part of the auditing process. • 電腦輔助稽核技術協助我們使用電腦查核。 • 但仍有其他選擇: 8
  • 9. 1.1 Related work. data mining in systems auditing • Data analysis software. • Network security assessment software. • Assessment software for operating systems and database management systems. • Software and source code testing tools • 資料分析軟體 • 網絡安全評估軟體 • 用於操作系統和資料庫 庫管理系統的評估軟體 • 軟體和原始碼測試工具 9
  • 10. 1.1 Related work. data mining in systems auditing • Clustering(群集) is a data mining technique that may be employed for outlier detection. • Several clustering techniques are available, including the following: 1. Hierarchical clustering階層式分群 2. Partitioning methods分區群集,讓群內偏差最小 3. Density-based clustering依照密度做分群 10
  • 11. 1.1 Related work. data mining in systems auditing • However, procedures have not been formally established to construct a system auditing tool from data mining techniques applicable to alphanumeric fields, which is the goal of this article. Furthermore, the tool we develop must not require its user to be an expert in data mining. • 資料探勘用於審計的方法論尚未建立,也不應讓 查核人員成為資料探勘專家。 11
  • 13. 3. Materials and methods • 1. Detection of outliers with unsupervised learning. • 2. Detection of outliers with supervised learning. • 3. Detection of outliers with semisupervised learning.在分群的過程中,用有標籤過的資料先切 出一條分界線,再利用剩下無標籤資料的整體分 布,調整出兩大類別的新分界。具有非監督式學習 高自動化的優點,又能降低標籤資料的成本。 13
  • 14. 3. Materials and methods 挑選方法時的考慮事項 • The ability of the algorithm to produce results that are comprehensible for the final user.可理解 • The efficacy in its detection of outliers. 效率 • The false positive rate.誤報率 • The compatibility among the algorithms with the objectives of the procedure.相互兼容 • The expected improvement of the efficacy by combining several techniques in comparison to using them separately. 合併或分開的效率 • That the algorithm can operate on alphanumeric data. 處理文數字資料 • That the algorithms did not require a large number of parameters and can be easily automated. This is very important in cases lacking expert auditors for data mining. 在沒有大量資料時也可使用 • Finally and importantly, that the algorithms are capable of improving their capacity to specifically detect outliers. 提高檢測效率 14
  • 15. 3.1.1. Outlier detection algorithms • LOF (Local Outlier Factor) 依照資料密度檢測、非監督式 • DBSCAN (Density-Based Spatial Clustering of Applications with Noise) 依照資料密度檢測、非監督式 • DB-Outliers (Distance-Based Outliers) 依照資料間距離檢測 • COF (Class Outlier Factor) 類似DB-Outliers,但依照類別檢測 15
  • 16. 3.2.1. Selection of outlier detection specific algorithms • Testing and selection were based on an artificial database that was created in accordance with the guidelines set forth by several authors. • 人工創建測試資料庫,並隨機安排異常值 • Table 1 details the characteristics of the test database, as follows: 16
  • 17. 3.2.1. Selection of outlier detection specific algorithms • DB-Outliers and COF performed excellently, but they were discarded as they required that the total number of outliers to be detected be provided beforehand. As this number is unknowable in real conditions, LOF and DBSCAN were chosen instead. 17
  • 18. 3.2.1.1. Merging results from LOF and DBSCAN. • If “LOF”r1.2,then “LOF_value” is “0”. • If “LOF”41.2,then “LOF_value” is “1”. • If the tuple belongs to a cluster other than cluster 0, “DBSCAN_value” = “0”. • If the tuple belongs to cluster 0, “DBSCAN_value” = “1”. LOF DBSCAN 18
  • 19. 3.2.1.3.Combining LOF and DBSCAN • In Table 3, we can see that outlier detection generated a 4% improvement based on the use of LOF and a 24% improvement based on the use of DBSCAN. False positives were reduced by 1% when compared to LOF and by 4% when compared to DBSCAN. 19
  • 20. 3.2.2.1. Classification algorithms combination • C4.5效率佳,並不影響誤報率 kept the efficacy level intact but did not affect the percentage of false positive results • Bayesian Network (BN)消除了誤報,但嚴重影響了 異常值檢測效果 eliminated all false positives but drastically affected outlier detection efficacy • PART 在自身模型中有不錯的結果 obtain the best possible global results through their respective models 20
  • 21. 3.2.2.1. Classification algorithms combination • If any tuple meets none of these conditions, then “outlier_type” = clean. 21
  • 22. 3.3. Designing the proposed process 22
  • 23. 3.3. Designing the proposed process A. Read and pre-process the database. 理解並預處理資料 B. Apply LOF. Add “LOF_value” attribute C. Apply DBSCAN. Add the “DBSCAN_value” attribute 進行LOF、DBSCAN D. Merge the results. Add the “outlier_type” attribute 結合兩種方法 23
  • 24. 3.3. Designing the proposed process E. Read the database F. Apply C4.5 G. Apply BN H. Apply PART I. Merge the results. J. Apply the rules for outlier tuple determination as per the criteria K. Save the final results of the “outlier_type” target attribute for each tuple; this can be either “clean” or “outlier”. L. End the procedure 24
  • 26. 4.1. Academic management system of a university • a database from a university student management system, investigating the audit records from the “Exam Management”, “Course Management”, and “Enrollment Management” 26
  • 27. 4.1. Academic management system of a university • A minimum efficacy of 65% and a 1% maximum for false positives were established after consultation with the aforementioned experts, two system administrators who work on the student management system and have ample experience in academic management systems. • 最低效率65%、誤報率最高1%。 27
  • 28. 4.1. Academic management system of a university • The data selected for the test came from the year 2000, as it was determined by the experts that most anomalous operations occurred during that year. 28
  • 29. 4.1. Academic management system of a university • To clarify the meaning of outliers in the academic management system, let us list some examples of activities that the experts consider as anomalies: • 在此案例中的異常值: • 1. activities in the audit log during either holidays or outside the shifts of the personnel 假期期間或在輪班之外的活動 29
  • 30. 4.1. Academic management system of a university • 2. operations not meeting the profile or permissions for a given user 不符合權限的操作 • 3. activities going against the internal regulations defined by the university 違反大學內部規定的活動 • 4. data recorded outside the date established under the calendar of the university 在大學行事曆以外的記錄 30
  • 31. 4.4. Result • The efficacy was always over 66%(>65%), with a mean value of 76%. The number of false positives in all the cases is smaller than 0.67%(<1%), with a minimum value of 0.10%. 31
  • 32. 4.4. Result • the classification of the types of outliers detected by using our procedure applied to the academic management database. 32
  • 34. 5. Conclusions • Based on these findings, we can conclude that the data mining-algorithm merged approach can be considered to be a resounding success, allowing us to develop a process and to apply it on audit tables from a real database and thus to facilitate the system auditor's job. • 資料探勘方法的合併使用是成功的,允許我們開 發一個流程並將其應用於真實資料庫,進而促進 審計員的工作。 34
  • 35. 5. Conclusions • We have also contemplated the further optimization of the process' efficacy and would also like to reduce the rate of false positives to even lower levels in our future work. • 希望未來能將誤報率降低 • We have analyzed the convenience of employing fuzzy logic, as in many cases a tuple does not respond to the two values that our process establishes. • 我們已經分析了使用模糊理論的便利性,因為在許 多情況下,是不能被效率及誤報率反映出的。 35