Data Analytics for Internal Auditors - Understanding Sampling

3/11/2019
1
Data Analytics - 1
Understanding Sampling
based on Data Analytics for
Internal Auditors
by Richard Cascarino
About Jim Kaplan, CIA, CFE
 President and Founder of AuditNet®,
the global resource for auditors (now
available on iOS, Android and
Windows devices)
 Auditor, Web Site Guru,
 Internet for Auditors Pioneer
 Recipient of the IIA’s 2007 Bradford
Cadmus Memorial Award.
 Author of “The Auditor’s Guide to
Internet Resources” 2nd Edition
Page 2
1
2

3/11/2019
2
About Richard Cascarino, MBA,
CIA, CISM, CFE, CRMA
• Principal of Richard Cascarino &
Associates based in Colorado USA
• Over 28 years experience in IT audit
training and consultancy
• Past President of the Institute of
Internal Auditors in South Africa
• Member of ISACA
• Member of Association of Certified
Fraud Examiners
• Author of Data Analytics for Internal
Auditors
3
About AuditNet® LLC
• AuditNet®, the global resource for auditors, is available on the
Web, iPad, iPhone, Windows and Android devices and features:
• Over 3,000 Reusable Templates, Audit Programs,
Questionnaires, and Control Matrices
• Training without Travel Webinars focusing on fraud, data
analytics, IT audit, and internal audit
• Audit guides, manuals, and books on audit basics and using
audit technology
• LinkedIn Networking Groups
• Monthly Newsletters with Expert Guest Columnists
• Surveys on timely topics for internal auditors
• NASBA Approved CPE Sponsor
Introductions
Page 4
3
4

3/11/2019
3
The views expressed by the presenters do not necessarily represent
the views, positions, or opinions of AuditNet® LLC. These materials,
and the oral presentation accompanying them, are for educational
purposes only and do not constitute accounting or legal advice or
create an accountant-client relationship.
While AuditNet® makes every effort to ensure information is
accurate and complete, AuditNet® makes no representations,
guarantees, or warranties as to the accuracy or completeness of the
information provided via this presentation. AuditNet® specifically
disclaims all liability for any claims or damages that may result from
the information contained in this presentation, including any
websites maintained by third parties and linked to the AuditNet®
website.
Any mention of commercial products is for information only; it does
not imply recommendation or endorsement by AuditNet® LLC
Today’s Agenda
 Statistical and Non-Statistical Concepts
 Judgmental vs Statistical Sampling
 Probability theory in Data Analysis
 Types of Evidence
 Sampling Methods
 Sample Sizing and Selection
 Attribute vs Variables Sampling
 PPS Sampling
 Population Analysis
 Correlations and Regressions
Page 6
5
6

3/11/2019
4
Statistical & Nonstatistical
Concepts
 Why Sample?
Speed
Cost
to ensure data is "Substantially or Materially Correct"
Not mentioned in IIA Standards but "Information should be Sufficient,
Competent, Relevant, and Useful to provide a sound basis for audit findings
and recommendations"
 "Sufficient information is Factual,
Adequate, and convincing so that a
prudent, informed person would reach the
same conclusion as the auditor"
Statistical vs Nonstatistical
Sampling
 Similarities
Both require Auditor Judgment
Audit Procedures performed will not differ
Both permitted in Audit practice
 Differences
Statistical Plans
Control and Measure Sampling Risk
Require Technical Training and Expertise
Normally Require Computer Facilities
7
8

3/11/2019
5
Statistical Sampling,
Advantages
 Provides the opportunity to select the minimum sample
size required to satisfy the objectives Provides a
quantitative measure of the sampling risk
 Permits the auditor to explicitly specify a level of
Reliability Confidence) and a desired degree of
Precision (Materiality)
 Provides a measure of sufficiency of the evidence
gathered
 Provides for more objective results for management
 Provides a more defensible expression of test results
 Is simple to apply with computer software
Statistical Sampling,
Disadvantages
 Requires random sample selection which
may be more costly and time consuming
 May lead to problems in establishing a
correlation between the sample and the
population if not appropriately organized
 May require specific staff training
 May require the acquisition of specialized
software
9
10

3/11/2019
6
Nonstat Sampling -
Advantages & Disadvantages
 Advantages
 Allows the auditor to utilize his subjective judgment to
influence the sample towards items of greatest value and
highest risk
 May be equally effective and efficient as statistical sampling
but may cost less
 Disadvantages
 Statistical inferences may not be objectively valid
 Cannot quantitatively determine sampling risk
 Risks over or under auditing depending on the experience
and judgment
Terminologies - 1
 Uncertainty
Audit Risk - a combination of Inherent Risk,
Control Risk and Detection Risk
 Inherent / Control Risks - assessed by
Auditor's Judgment
 Detection Risk
 Sampling Risk - risk of sample being non-
representative
 No sampling Risk - all other aspects of
Audit Risk (eg Audit Procedures were not
appropriate
11
12

3/11/2019
7
 Sampling Risk
Risk of incorrect acceptance (less chance of
successful Audit) ( risk
 Risk of incorrect rejection (greater Audit
effort) ( risk
 Risk of assessing Control Risk too high
(less chance of a successful Audit)
 Risk of assessing Control Risk too low
(greater Audit effort)
Terminologies - 2
Terminologies - 3
 Confidence Level (Reliability)
Percentage of times one would expect the sample
to adequately represent the full population
The higher the percentage, the more
representative the sample
The higher the confidence percentage required,
the larger the sample
 Precision
 How close the sample estimate is to the
true population value
13
14

3/11/2019
8
Terminologies - 4
 Population (total collection of items about which an
opinion will be expressed)
 Sampling Unit (the individual items making up the
population)
 Frame (sample frame is a listing of the Sampling
Units making up the Population)
 Sample (collection of sampling units drawn from the
frame that will be subject to Audit Procedures)
 Measures of Location / Central Tendency (Mean,
Median, Mode)
 Mean (value of the population divided by the number of
items)
 Median (middle value in a population)
 Mode (most frequently occurring value)
Bias in Sampling
 May arise if:
 The sampling frame does not cover the
population adequately or accurately
 The sample is non-random (eg
"convenient")
 Subjective judgment enters into the
selection criteria
 These can lead to systematic, non-
compensating errors in a sample
15
16

3/11/2019
9
Definition of Probability
 Experiment: toss a coin twice
 Sample space: possible outcomes of an experiment
S = {HH, HT, TH, TT}
 Event: a subset of possible outcomes
A={HH}, B={HT, TH}
 Probability of an event : an number assigned to an event
Pr(A)
Joint Probability
 For events A and B, joint probability Pr(AB)
stands for the probability that both events
happen.
 Example: A={HH}, B={HT, TH}, what is the joint
probability Pr(AB)?
17
18

3/11/2019
10
Primary Types of Sampling
 Attribute Sampling
 Two-way (dichotomous) scale
 Primarily yes / no type answers
 looks at likelihood of errors in populations
 Variables Sampling
 Samples a population based upon some
specific variable (eg Value)
Qualitative information
 Used to obtain estimates of values etc
Types of Evidence
 Obtained by:
Inspection
Observation
Documentation
Confirmation
Inquiry
Analytical Procedures
Recalculation
Re-performance
20
19
20

3/11/2019
11
Primary Types of Sampling
 Probability Proportionate to Size (PPS)
 A new approach to Variables Sampling
 Uses Attribute Sampling methods to
estimate Rand Amounts
 Uses "Dollar" as sampling unit to select
items for audit
 Also called Dollar Unit Sampling (DUS),
Cumulative Monetary Amount (CMA), and
Combined Attribute Variables (CAV)
sampling
Sampling Methods
 Sampling Approaches (Sampling Plans)
 Attributes Sampling
 Discovery Sampling
 Stop-or-Go Sampling (Sequential Sampling)
 Variables Sampling
 PPS Sampling
 Judgmental
21
22

3/11/2019
12
Non-Statistical Selection
Methods (1)
 Haphazard Selection
 Auditor's best guess of a representative
sample
 Often used where no extrapolation will be
done
 Block Selection
 Used on blocks of transactions (eg all
transactions within a time scale)
 Use with caution since inferences beyond
the block may be invalid
Non-Statistical Selection
Methods (2)
 Judgment Method
 Basic Issues
Value of items
Relative risk
Representativeness
23
24

3/11/2019
13
Sampling Methods (Techniques
of Selection)
 Random Number Sampling
 Interval Sampling (Systematic
Selection)
 Stratified Sampling
 Block Sampling (Cluster Sampling)
 Probability Proportionate to Size (PPS)
 Mechanized
Random Number Sampling
 Every unit in a population has an
equal chance of being selected
 Use of Random Number Tables
 Random Number Generators
 Numbers outside the selection range
are ignored
 Sampling with Replacement
 Sampling without Replacement
 Normal assumption in applied
statistics is Without Replacement
25
26

3/11/2019
14
Main Features of Simple
Random Sampling
 "Simple" as compared to Systematic
and Stratified Sampling
 Randomization ensures the validity of
inference
 The standard against which other
sampling methods are evaluated
 Suitable where the population is
relatively small
 Suitable where the sampling frame is
complete
Systematic Sampling
(Interval Sampling)
 From an unnumbered population
 Picking the first item at random
 Then at a regular interval (eg every
10th item)
 Assumes
 Random distribution across the population
 True randomness of selection is not
required
27
28

3/11/2019
15
Stratified Sampling
 Used to minimize the variability of
population units (control distortion)
 Permits the drawing of a smaller
sample size
 Population stratified into mutually
exclusive groups
 Requires clearly definable strata
 Need not be stratified on a value basis
Block Sampling
(Cluster Sampling)
 Used where Simple Random Sampling
is too time-consuming or expensive
 Clusters of items are selected at
random
 Clusters may then be sampled or
100% checked
 Clusters may be "Natural" (items
normally found together)
 Clusters may be "Artificial" (selected
by the auditor
29
30

3/11/2019
16
 Variation of Attribute Sampling
 Uses Attribute Approach to express an
opinion in value terms rather than a
deviation rate
 An alternative to stratification
 Samples cumulative values on a value
interval
 Automatically biases hits towards
high value items
Probability Proportional to
Size (Dollar Unit Sampling)
 Two paths to selection
 Directed
Used when serious error or manipulation is
suspected
Not scientific sampling
Used purely to detect a suspected condition
May not be relied on to draw conclusions about
the population
 Random Sampling
Seeks to represent the population
Taking a snapshot in miniature
The larger the sample the closer it depicts the
population, the more it can be relied upon
Sample Sizing and Selection
31
32

3/11/2019
17
Factors affecting sample size
Population size?
Population variability
Expected error rate
Desired precision
Confidence level
Tolerable Error
Sample Sizing
 Sample size increases as
population increases
 Increase is not proportional
Populations of over 5000 require very
little increase in sample size
Population 50 Sample 33
Population 55000-100000 Sample 93
Population Size
33
34

3/11/2019
18
 Substantial effect on Sample size
 Variability is Standard Deviation of a
population
 Standard Deviation is computed by
Taking the difference if each item from
the mean
Squaring the difference
Adding the squares and averaging
Taking the square root of the average
Population Variability
(Variables Sampling)
 As the Standard Deviation increases the
sample size increases
 Rule of thumb
 Changes in a population's variability affects the
Sample size by the square of the relative
change
 Where there is a large deviation,
Stratification may be required
 Generally the more widespread the
values, the larger the sample
Effect of Standard Deviation
on Sample Size
35
36

3/11/2019
19
 Initial Auditor assessment of expected
population error rate (Deviation Rate or Rate of
Occurrence)
 The higher the expected error rate, the larger
the sample
 If expected error rate of 1% gave a sample size of 93
 Then an expected error rate of 3% would give a
sample size of 361
 All other factors being equal
 If the sample shows a higher than expected
error rate?
Expected Error Rate
(Attribute Sampling)
 Also called Desired Allowance for
Sampling Risk
 eg Inventory is estimated at R
1,000,000 plus/minus R 200,000
 The tighter the desired precision, the
larger the sample size required
 Sample size changes by the square of
the relative change in precision
 eg +/- R 50,000 is a change in desired
precision by a factor of 4
 Sample size would increase by sixteen
Desired Precision
37
38

3/11/2019
20
 Percentage of time that the sample adequately
represents the population (ie that the estimation of
value can be x% relied upon)
 95% confidence level states that for a given sample size, if
the sample was taken 100 times, 95times the sample
selected would adequately represent the population
 The higher the confidence required, the larger the sample
 In Variables Sampling, primary concern is Risk of incorrect
acceptance
 In Attribute Sampling, primary concern is Risk of
assessing the control risk too high
Confidence Level (Reliability)
 Tolerable misstatement of a value
 eg R 100,000 +/- R 10,000 gives
Confidence Interval of R 90,000 to R
110,000
 Primary concern is the risk of
Incorrect Rejection
 Relates to efficiency of the audit
Confidence Interval
(Precision)
39
40

3/11/2019
21
 Found by multiplying a Reliability Factor by
the Standard Deviation of the Sample
 Then adding and subtracting from the
Sample Estimate
 Assuming a Normal Distribution
 eg a 95% confidence level results in a 1.96
reliability factor
 Confidence Interval therefor equals
Estimated Value +/- (Reliability Factor x
(Standard Deviation / Square Root of Sample
Size))
Confidence Interval
 The maximum rate of deviations the
Auditor will accept
 The closer the expected error rate is
to the Tolerable Error, the larger the
sample
 The larger the Tolerable Error, the
smaller the sample
Tolerable Error
41
42

3/11/2019
22
 Where
 C is the Confidence Coefficient
 p is the max error rate
 q is 100%-p
 P is the desired precision
 n is the sample size
 n = C2pq
 P2
Calculating the Sample Size
(Attribute Sampling)
 Where the population is 1000, desired
precision is +/- 2%, desired confidence
level is 95% and the estimated error
rate is not to exceed 5% then
 C = 1.96 (Confidence Coefficient at
95%)
 p = 0.05
 q = 0.95
 P = 0.02
 n = 1.962 x 0.05 x 0.95
 0.022
 n = 45.6
For Example
43
44

3/11/2019
23
 Usually deals with monetary values
 Can be used for other variable values
Time Periods
Quantities
Weights
 Determines estimates of values etc. to
predetermined tolerances
 Determine
Population Size
Desired Confidence Level
Desired Precision
 but
Instead of error rate
Find the Standard Deviation
Variables Sampling
 Where
 n1 is the preliminary sample size
 C is the confidence coefficient
 S is the standard deviation of the
population
 P is the desired precision then
 n1 = C2 S2
 P2
Calculating the Sample Size
(Variables Sampling)
45
46

3/11/2019
24
 Special applications of Attribute Sampling
 An error rate is less than a specified level
(Acceptance)
 Sampling for critical errors (Discovery)
 Minimum sample to determine an error rate at
a prescribed confidence level (Stop-or-Go)
Acceptance, Discovery,
Stop-or-Go Sampling
 Where
s = Standard deviation of the sample
Σ = Sum of
x = Value of each sample item
n = Sample size
 then
 s = √ Σ(x2)- Σ(x)2/n
 n-1
 For Example
 In a population where Mean is 20, three
samples were drawn, values 11, 20, 29
 s = √ 81
 s = 9
Standard Deviation
47
48

3/11/2019
25
 Mean of the Distribution +/- one
standard deviation includes 68% of
the area under the curve
 Mean +/- two standard deviations
includes 95.5% of the area
 Mean +/- three standard deviations
includes 99.7% of the area
In a Normal Distribution
 Std Deviations Area Under the
Curve
 Confidence Coefficient Confidence
Level
 1.0 68%
 1.64 90%
 1.96 95%
 2.0 95.5%
 2.7 99%
Normal Distributions
49
50

3/11/2019
26
 Also known as Dollar Unit Sampling
(DUS)
 Cumulative Monetary Amount (CMA)
 Combined Attribute Variables (CAV)
 Commonly used to assess whether
values are overstated
 Uses a different formula to determine
Sample Size
Probability Proportional to
Size Sampling (PPS)
 Where
 n = Sample size
 BV = Book value of the account (eg Accounts
Receivable)
 RF = Risk factor (multiplier see below)
 TE = Tolerable error (auditor's judgment)
 n = BV x RF
 TE
Sample Size
51
52

3/11/2019
27
 Reliability Required Reliability Factors
 99% 4.605
 95% 2.996
 90% 2.300
 For Example
 Value of Inventory = R 500,000
 Auditor Specified Material Error = R 10,000
 Auditor Determined Little effective control (ie
RF=2.6)
 n = BV x RF = 500,000 x 2.6 = 1300 = 130
 TE 10,000 10
 Sampling Interval is therefor 500,000 / 130 = 3846
Reliability Factors
 Stock Unit Value Amount Cum Amount
 30 16 480 480
 90 100 9000 9480
 92 111 10212 19692
 70 40 2800 22492
 20 15 300 22792
 Sampling Interval = 3846
 If no errors found Auditor concludes
Finished Goods Inventory has a maximum overstatement of
10,000 with 95% Reliability
 If errors did occur
The average error amount must be projected to the
whole population (Tainting Percentage)
Application
53
54

3/11/2019
28
 PPS tends to select high value items
 PPS unaffected by population item variability
 PPS may result in a smaller sample size
 PPS easy to implement
 PPS does not require the normal
approximations required by variables
sampling
 PPS Permits a statistically valid sample
selection which includes more high value
items
PPS Advantages
 PPS requires that the population be
cumulatively totaled
 As errors increase the sample size may be
larger than with other sampling methods
 PPS primarily designed to detect
overstatements
 Zero or negative items are presumed not to
occur
 PPS not intuitively as appealing to auditors
PPS Disadvantages
55
56

3/11/2019
29
 Difference Estimation
 Determine the difference between audit and book values
 Calculate the mean difference
 Multiply the mean difference by the numbers in the
population
 Allow for sampling risk
 Useable where small errors predominate and there is no
skew
 Mean-per-unit (MPU) Sampling
 Average the audit value of the sample
 Multiply it by the population size (Not very accurate)
 Ratio Estimation
 Multiply the book value of the population by the ratio of
audit value to book value of the sample
 Useable where small errors predominate and there
is no skew
Other Sampling Types
 No assumption of normal distribution
 Wold-Wolfowitz Runs Test
 Mann-Whitney or Wilcoxon Test
 Kolmogorov-Smirnov Test
 Fisher's Exact Test
 All use two samples
Non-Parametric Tests
57
58

3/11/2019
30
Population Analysis
 N1 and N2 are
regions of normal
behavior
 Points o1 and o2
are anomalies
 Points in region
O3 are anomalies
X
Y
N1
N2
o1
o2
O3
Types of Data
 Nominal
Person’s name, Country etc.
 Ordinal
Information with a natural sequence
Eg finishing order in a race
 Interval
Ordinal Data with equal intervals
 Ratio
Interval measurable from a fixed base
60
59
60

3/11/2019
31
Data Analysis Software
 ACL Audit Analytics
Powerful program for data analysis
Most widely used by auditors worldwide
 CaseWare’s IDEA
Recent versions include an increasing number of
fraud techniques
ACL’s primary competitor
Correlation Analysis
 Measurement of the extent of association of one
variable with another
 Two variables are said to be correlated when they
move together in a detectable pattern
 Direct correlation is said to exist when both
variables increase or decrease in the same time
although not necessarily by the same amount
 Correlation analysis is used by internal auditors
those to identify those factors which appeared to be
related
62
61
62

3/11/2019
32
Examples
63
Regressions
 Regression analysis infers a causal relationship
between the two sets of data, so that not only is the
data related, but a change in one will cause a
change in the other
 Regression is also referred to as the least squared
method
64
63
64

3/11/2019
33
Example
65
Questions?
Any Questions?
Don’t be Shy!
65
66

3/11/2019
34
AuditNet® and cRisk Academy
If you would like
forever access to this
webinar recording
If you are watching
the recording, and
would like to obtain
CPE credit for this
webinar
Previous AuditNet®
webinars are also
available on-demand
for CPE credit
http://criskacademy.com
http://ondemand.criskacade
my.com
Use coupon code: 50OFF
for a discount on this
webinar for one week
Thank You!
Jim Kaplan
AuditNet® LLC
1-800-385-1625
Email:info@auditnet.org
www.auditnet.org
Richard Cascarino & Associates
Cell: +1 970 819 7963
Tel +1 303 747 6087 (Skype Worldwide)
Tel: +1 970 367 5429
eMail: rcasc@rcascarino.com
Web: http://www.rcascarino.com
Skype: Richard.Cascarino
Page 68
67
68

Data Analytics for Internal Auditors - Understanding Sampling

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data Analytics for Internal Auditors - Understanding Sampling

Similar to Data Analytics for Internal Auditors - Understanding Sampling (20)

More from Jim Kaplan CIA CFE

More from Jim Kaplan CIA CFE (20)

Recently uploaded

Recently uploaded (20)

Data Analytics for Internal Auditors - Understanding Sampling