Digital Forensic Readiness for XML Accounting

//Defence of MSc Dissertation
XML Accounting Trail: A model for
introducing digital forensic readiness
to XML Accounting and XBRL
by
Dirk Kotze
22 July 2015
Promotor: Prof. Martin S. Olivier

2
Introduction
 21st century economy: information
 Business: Need for making sense & sharing of
financially relevant information (e.g. accounting
data)
 XML
 Rise
 Requirements
 Lit. review – XML weaknesses
 Cyber-Crime
 $800.5 million – 2014
 Mostly fraud
 60% discovered by accident/tip-off (ACFE 2014)
 Digital forensics

3
Introduction (2)
 Big Data Problem – How does investigator know
whether XML financial accounting data has been
modified?
 Research problem
XML financial data is susceptible to tampering due to the human-
readability property required by the XML data specification. Upon
presentation of a set of XML financial data, how can one determine
whether data has been tampered with, and reconstruct the past events
so that the nature of tampering can be determined?
 Purpose
 Detect
 Reconstruct
 Research Method
 Method (detecting)
 Model (reconstructing)

4
Background
 Overview of key topics
 Most of you should be familiar
 Brief discussion of key concepts necessary to
understand later work
 Will be discussing
 Digital Forensics
 Compilers
 Won’t be discussing (assume everyone is familiar
with and due to time constraints)
 XML

5
Background: Digital Forensics
 Definition (McKemmish)
“the application of computer science and investigative
procedures for a legal purpose involving the analysis of
digital evidence after proper search authority, chain of
custody, validation with mathematics, use of validated
tools, repeatability, reporting, and possible expert
presentation”
 Economics
 Cost
 Disruption
 Complexity (Data to analyse, Anti-Forensics)
 Forensic Readiness

6
Background: Compilers
 Stages of compilation
 Analysis & Synthesis
 Synthesis out of scope
 Analysis
 Lexical Analysis
 Syntactic Analysis
 Semantic Analysis
 Error handling
 Panic mode
 Phrase Level
 Minimise noise (graph)
 Error Productions
 Pre-specify known patterns of data irregularities
 Global Correction
 Determine potential data irregularities introduced – minimum
change required to make input correct.

7
Detecting Data Irregularities
 Problem Statement
 Forensic Pathology
 Analysing XML files
 Rigid structure and accounting rules
 Definition of data irregularity
 Any unauthorised modification to XML accounting data that
impact the semantic meaning of the financial accounting
content.
 How do these occur?
 Direct modification (bypassing controls and rules)
 Indirect modification (via application) of illegitimate
transaction
 Large Data Set Problem

8
Detecting Data Irregularities (2)
 Analysing XML files
 Trend analysis/pattern analysis
 Double entry example
 Salami attack example
 Manual vs Automated Searching
 Automating the search for data irregularities
 Compiler Theory
 Classification of input, based on patterns as well as pre-
defined rule sets; and
 Recursive identification of patterns, using decision tree.
 Handling of errors

9

10
 How process works
 Establish Rule Set
 normal transactions (no errors will be noted); and
 error productions (patterns of transactions that deviate
from the norm i.e. data irregularities).
 Execute Compiler
 Results
 Disclaimers

11
Applying Automated Detection of Data
Irregularities
Application
 Consider sample XML Accounting Format
Example 1: Generic XML accounting data format
<Transaction>
<ID> 101-1 </ID>
<Account> Bank </Account>
<Action> Credit </Action>
<Amount> 25000 </Amount>
<User> 012437 </User>
<Date> 6/19/2011 8:25:02 AM </Date>
<Hash> 1a88f9a8293e88c87ae1ae5f8bd63585 </Hash>
<Transaction>

12
Irregularities (2)
Type of Error
Lexical Syntactic Semantic
XML Data
1.1. Tag not opened or
closed correctly, e.g. a
missing ‘<’ or ‘>’.
1.2. Amounts that contain
non-numeric characters.
1.3. Reserved characters (<
or >) used in transaction
statement.
2.1. The XML schema is
violated.
2.2. A transaction entry has a
missing or imbalanced tag for:
• Transaction, Balance,
Hash, User, Date,
Amount, Account, etc.
2.3. Tags that are not defined,
e.g. a tag containing a spelling
error on the tag name.
2.4. An entry matches one or
more predefined rules
specifying an incorrect
transaction.
3.1. Tag is not correctly
specified to match the
content described by the tag,
for example the tag attribute
incorrectly specifies 24 hour
time whilst the time is
specified in AM/PM ‘<’Time
format=”HH:mm”‘>’ 12:30 PM
‘<’/Date‘>’.

13
Irregularities (3)
Type of Error
Lexical Syntactic Semantic
Data Errors
(Errors in
Accounting
Data)
1.4. Irregularities are found
in the formatting of the
data, introduced by editing
the machine generated
data, e.g. numbers within
tags are given a comma to
indicate thousands, but the
comma is omitted in certain
numbers.
1.5. The data contained
within XML tags is bad, e.g.
a ‘;’ or ‘@’ character occurs
in a number, or date with
the month specified as
larger than 12.
2.5. A violation of the
hierarchical structure and/or
order of the tags, e.g. an ID tag
that exists in isolation (instead
of belonging to a parent tag,
such as a transaction), or a
transaction tag without
children.
2.6. The allocation of optional
tags that is not applicable to
the tag object, e.g. listing an
asset number together for a
vehicle in a furniture purchase
transaction.
3.2. Omission of part of a
transaction, e.g. a transaction
with a missing corresponding
double entry.
3.3. Transaction ID Errors:
 ID skipped
 ID repeated
3.4. Violation of transaction
logic, e.g. purchase fulfilment
comes before order.

14
Irregularities (4)
 Handling of errors:
 Lexical: Typically Panic mode.
 Syntactic: Panic mode or Phrase-Level correction. Also,
error productions.
 Semantic: Can be done in rule set, e.g. error productions
or global correction, but needs additional consideration by
investigator.
 Handling of semantic errors:
 Allows for hypothesis leading to reconstruction
 Investigator can look at:
 Statistical analysis e.g. Benford
 Benford’s law (also known as the first-digit law), applies to
most large sources of numerical data and refers to the
frequency distribution of the first digit of such data. In
summary, Benford’s law concludes that digits starting with a
‘1’ should occur around 30% of the time, whilst larger digits
occur in that position less frequently.
 Analysis of time trends
 Transaction order

15
Advantages
 Investigation time shortened
 Triage: Indication of whether XML accounting data
file requires further investigation
 Little chance of error/non-detection of data
irregularities.

16
Reconstructing the events
 Problem statement
 Investigative questions
 When?
 What?
 Who?
 Why?
 How?
 XML does not store this info and not available
elsewhere
 Black box
 Similar to aircraft crash
 Instrumentation

17
Reconstructing the events (2)
 Minimum set of evidence required:
 Evidence showing the details of the data modifications;
 Evidence stating the date and time of the modification;
and
 Evidence showing who modified the data.
 How & why not covered.
 Architecture
 Logging of evidence
 Event reconstruction history not available
 Need for real time logging
 Interrupts vs. Real-time Proxy
 Reference monitor
 Circumvention? Need for tamper-proofing of XML file.
 Digital Signatures (email)

18
Reconstructing the events – Need for a
reference monitor

19
Reconstructing the events (3)
 Reconstructing the ‘What?’
 Version Control
 Reconstructing the ‘When?’
 Logging
 Timestamps
 Local vs. trusted external
 Reconstructing the ‘Who?’
 Disclaimer
 Username/Password authentication
 Storing the evidence
 Encryption

20
Reconstructing the events: Overview of the XML
Accounting Trail Model

21
Reconstructing the events: Overview of the XML
Accounting Trail Model (2)

22
Conclusion
 Research problem
XML financial data is susceptible to tampering due to the
human-readability property required by the XML data
specification. Upon presentation of a set of XML financial
data, how can one determine whether data has been
tampered with, and reconstruct the past events so that the
nature of tampering can be determined?
 Proposal
 Method to detect data irregularities
 Compiler
 Model to reconstruct events
 Instrumentation

23
Conclusion (2)
 Self evaluation & future work
 Despite best efforts, areas in research always exist
where answers may not be clear or proposed solution
leads to more questions. Therefore, important to step
back and reflect on suggested work suggested to ID
shortcomings and areas for future work.
 Detecting data irregularities
 Shows great promise but no real world implementation
(prototype)
 Rule set is key
 Incomplete/bad rule set – compiler won’t work
 Template Rule Sets (future work)
 Expanding use of errors & error handling
 Global error correction

24
Conclusion (3)
 XML Accounting Trail
 Lack of real-world implementation
 If reference monitor compromised, work has been for
naught.
 Secure private key already some protection, but not
complete
 Securing reference monitor using anti-forensics & anti-
hacking to protect private key against extraction

25
Published Work
 XBRL-Trail: A Model for Introducing Digital Forensic
Readiness to XBRL, In Proceedings of the Fourth
International Workshop on Digital Forensics &
Incident Analysis (WDFIA), 2009, pages 93-104.
 Detecting XML Data Irregularities by Means of Lexical
Analysis and Parsing. In Proceedings of the 9th
European Conference on Information Warfare and
Security, 2010, pages 151-159.

26
Acknowledgements
 Prof. Martin S. Olivier
 Prof. Stefan Gruner
 Dr. Wynand van Staden
 Employers (PwC/RMB) specifically Michael Nean
 Fiancé – Dr. Sheena Steyl
 Mom & Dad
 Dedicated to my Mom (passed away 09 Sept 2009)

Digital Forensic Readiness for XML Accounting

Recommended

Recommended

More Related Content

Similar to Digital Forensic Readiness for XML Accounting

Similar to Digital Forensic Readiness for XML Accounting (20)

Digital Forensic Readiness for XML Accounting