Big Data and other
Buzzwords
Cutting Through the Noise to Develop a Strong Internal Audit
Analytics Program
Andrew Clark
About me
● B.S. in Business Administration with a concentration in Accounting, Summa Cum Laude, from
University of Tennessee at Chattanooga.
● M.S. in Data Science from Southern Methodist University.
● Ph.D. Candidate in Economics at the University Reading, specializing in International Monetary
Policy.
● American Statistical Association Graduate Statistician (GStat), INFORMS Certified Analytics
Professional (CAP) and AWS Certified Solutions Architect – Associate.
● Experienced in designing, built and deployed numerous machine learning and continuous monitoring
solutions using open source technologies.
● Successfully developed and deployed an Audit Analytics program for a publicly traded decentralized
manufacturing company.
● Working as a Data Economist creating ecosystem economic design specifications by employing
mathematical engineering technologies to create novel solutions to solve business problems.
0. Outline
➔ Signal vs Noise
What is noise and what is important for
audit practitioners to know about
analytics?
➔ Current State of Analytics
What is the current state of the art in
audit analytics?
➔ Developing an Audit Analytics
Program
Steps to take, what to avoid, etc.
➔ Questions
Signal vs Noise
Analytics in the Wild
What do all these buzzwords
mean?
“Machine Learning based, artificial intelligent, Big Data
spewing, Deep Learning, Neural Network touting, Blockchain
based, Cognitive Computing, Virtual Reality Natural
Language Processing,…Chat Bot.”
Big Data - “The New Oil”
What is Machine
Learning?
A computer recognizing patterns
without having to be explicitly
programed.
What many businesses get wrong
The Current State of Audit Analytics
The Current State of Audit Analytics
● Beside periodic ad hoc analysis, very few companies in
any sector, let alone manufacturing, have deployed Audit
Analytics on a large scale.
● Key reasons why:
○ Lack of audit leadership knowledge around the capabilities and how
to setup a program
○ Budgetary constraints - Developers are expensive!
○ Skepticism of the effectiveness of audit analytics (hint: in
Manufacturing, it can be massively effective).
Building an Audit Analytics Program
Steps to take:
● Define what your needs are and desired end state. Ideally this
will include comprehensive planning, scoping, mapping to risk
areas, etc.
● Create a timeframe and budget allocation for a the program
ramp up (expect close to a year before a comprehensive
program will begin changing how you audit)
● Hire and/or train the appropriate staff.
● Fall proven methodologies to avoid recreating the wheel when
possible.
● Know when to ask for help from your colleagues and when to ask
for outside help.
What is Systems Engineering?
Tooling
● Open Source software, such as Python
○ Pros:
■ Vibrant community
■ State of the Art technology
■ Customizable and scalable
■ Cost
○ Cons:
■ Requires programming knowledge
■ Good programmers are expensive and auditors who can
program well are essentially nonexistent
Tooling Cont.
● Traditional CAATs, such as IDEA and ACL
○ Pros:
■ Well known within the audit community
■ Vendor support and training courses
■ Some existing knowledge in the audit community
○ Cons:
■ Not very user friendly
■ Requires extensive training to use effectively
■ Not very flexible or scalable
■ Does not provide the output auditors are expecting
Assuming you would like to use Python:
• The hard way: by learning it
• The even harder way: hire an auditor with programming,
analytics and auditing experience
• Create a cross functional team by borrowing a
programmer from IT and a business analyst from the
business.
• The *easiest* and usually most effective way: hire an
experienced Data Science firm to set-up the program
with the assistance of your auditors.
CRISP-DM Framework
● Business Understanding
● Data Understanding
● Data Preparation
● Modeling (out of scope for this talk)
● Evaluation
● Deployment
Business Understanding
● The most important step – ‘The Why’
● Why is this needed and what is the desired outcome?
Data understanding
● An understanding of where the data is coming from is key
to good modeling
● SQL relational database? NoSQL database? Csv, txt,
webpage, Tweets?
● What scale is the data on? For example, Celsius or
Fahrenheit?
Data Preparation
● Currently, close to 90% of what Data Scientists do
● ‘Munging’
● Data scaling
● Select variables
● Divide into test and train sets
● “I’m a data janitor. That’s the sexiest job of the 21st
century. It’s very flattering, but it’s also a little baffling.”
– Josh Wills, Head of Data Engineering @ Slack
Modeling
Out of scope for this discussion, besides basic descriptive
statistics
Evaluation
● Does the implementation solve the problem?
Deployment
● Integrated into existing infrastructure or application?
● Separate web application?
● Scheduled job?
● Run ad hoc?
Common pitfalls and mistakes
● Inadequate planning, hence the suggestion of Systems
Engineering and the CRISP-DM methodology
● Lack of audit committee buy in, and acknowledgement of
the long lifecycle of scaling up a program
● Missing the required technical staff and technical
leadership skills to management the process.
● Culture resistance
Example
Python-based Analytic test
● Snapshot of an analytic test used at a publicly traded,
billion dollar + manufacturer.
● https://github.com/aclarkData/AuditAnalytics
● 999 amount, weekends and keywords journal entry tests
● Scheduled and sent as an email to auditors monthly.
Conclusion
● Buzzword breakdown
● The Current State of Audit Analytics
● Developing an Audit Analytics Program
● An Audit Analytics Example
Questions?
Thank you!LinkedIn
Email: andrew@block.science
Phone: 423-504-5024
Personal website
BlockScience website
BlockScience
• BlockScience is an R&D, engineering design, and analytics firm focused on
business model innovation in an increasingly networked society. We define and
solve business challenges by leveraging AI, network science and other
mathematical engineering technologies; we remain human-centered while
adhering to time-tested systems engineering practices.

Big data and other buzzwords

  • 1.
    Big Data andother Buzzwords Cutting Through the Noise to Develop a Strong Internal Audit Analytics Program Andrew Clark
  • 2.
    About me ● B.S.in Business Administration with a concentration in Accounting, Summa Cum Laude, from University of Tennessee at Chattanooga. ● M.S. in Data Science from Southern Methodist University. ● Ph.D. Candidate in Economics at the University Reading, specializing in International Monetary Policy. ● American Statistical Association Graduate Statistician (GStat), INFORMS Certified Analytics Professional (CAP) and AWS Certified Solutions Architect – Associate. ● Experienced in designing, built and deployed numerous machine learning and continuous monitoring solutions using open source technologies. ● Successfully developed and deployed an Audit Analytics program for a publicly traded decentralized manufacturing company. ● Working as a Data Economist creating ecosystem economic design specifications by employing mathematical engineering technologies to create novel solutions to solve business problems.
  • 3.
    0. Outline ➔ Signalvs Noise What is noise and what is important for audit practitioners to know about analytics? ➔ Current State of Analytics What is the current state of the art in audit analytics? ➔ Developing an Audit Analytics Program Steps to take, what to avoid, etc. ➔ Questions
  • 4.
  • 5.
  • 6.
    What do allthese buzzwords mean? “Machine Learning based, artificial intelligent, Big Data spewing, Deep Learning, Neural Network touting, Blockchain based, Cognitive Computing, Virtual Reality Natural Language Processing,…Chat Bot.”
  • 7.
    Big Data -“The New Oil”
  • 8.
    What is Machine Learning? Acomputer recognizing patterns without having to be explicitly programed.
  • 9.
  • 10.
    The Current Stateof Audit Analytics
  • 11.
    The Current Stateof Audit Analytics ● Beside periodic ad hoc analysis, very few companies in any sector, let alone manufacturing, have deployed Audit Analytics on a large scale. ● Key reasons why: ○ Lack of audit leadership knowledge around the capabilities and how to setup a program ○ Budgetary constraints - Developers are expensive! ○ Skepticism of the effectiveness of audit analytics (hint: in Manufacturing, it can be massively effective).
  • 12.
    Building an AuditAnalytics Program
  • 13.
    Steps to take: ●Define what your needs are and desired end state. Ideally this will include comprehensive planning, scoping, mapping to risk areas, etc. ● Create a timeframe and budget allocation for a the program ramp up (expect close to a year before a comprehensive program will begin changing how you audit) ● Hire and/or train the appropriate staff. ● Fall proven methodologies to avoid recreating the wheel when possible. ● Know when to ask for help from your colleagues and when to ask for outside help.
  • 14.
    What is SystemsEngineering?
  • 15.
    Tooling ● Open Sourcesoftware, such as Python ○ Pros: ■ Vibrant community ■ State of the Art technology ■ Customizable and scalable ■ Cost ○ Cons: ■ Requires programming knowledge ■ Good programmers are expensive and auditors who can program well are essentially nonexistent
  • 16.
    Tooling Cont. ● TraditionalCAATs, such as IDEA and ACL ○ Pros: ■ Well known within the audit community ■ Vendor support and training courses ■ Some existing knowledge in the audit community ○ Cons: ■ Not very user friendly ■ Requires extensive training to use effectively ■ Not very flexible or scalable ■ Does not provide the output auditors are expecting
  • 17.
    Assuming you wouldlike to use Python: • The hard way: by learning it • The even harder way: hire an auditor with programming, analytics and auditing experience • Create a cross functional team by borrowing a programmer from IT and a business analyst from the business. • The *easiest* and usually most effective way: hire an experienced Data Science firm to set-up the program with the assistance of your auditors.
  • 18.
    CRISP-DM Framework ● BusinessUnderstanding ● Data Understanding ● Data Preparation ● Modeling (out of scope for this talk) ● Evaluation ● Deployment
  • 19.
    Business Understanding ● Themost important step – ‘The Why’ ● Why is this needed and what is the desired outcome?
  • 20.
    Data understanding ● Anunderstanding of where the data is coming from is key to good modeling ● SQL relational database? NoSQL database? Csv, txt, webpage, Tweets? ● What scale is the data on? For example, Celsius or Fahrenheit?
  • 21.
    Data Preparation ● Currently,close to 90% of what Data Scientists do ● ‘Munging’ ● Data scaling ● Select variables ● Divide into test and train sets ● “I’m a data janitor. That’s the sexiest job of the 21st century. It’s very flattering, but it’s also a little baffling.” – Josh Wills, Head of Data Engineering @ Slack
  • 22.
    Modeling Out of scopefor this discussion, besides basic descriptive statistics
  • 23.
    Evaluation ● Does theimplementation solve the problem?
  • 24.
    Deployment ● Integrated intoexisting infrastructure or application? ● Separate web application? ● Scheduled job? ● Run ad hoc?
  • 25.
    Common pitfalls andmistakes ● Inadequate planning, hence the suggestion of Systems Engineering and the CRISP-DM methodology ● Lack of audit committee buy in, and acknowledgement of the long lifecycle of scaling up a program ● Missing the required technical staff and technical leadership skills to management the process. ● Culture resistance
  • 26.
  • 27.
    Python-based Analytic test ●Snapshot of an analytic test used at a publicly traded, billion dollar + manufacturer. ● https://github.com/aclarkData/AuditAnalytics ● 999 amount, weekends and keywords journal entry tests ● Scheduled and sent as an email to auditors monthly.
  • 32.
    Conclusion ● Buzzword breakdown ●The Current State of Audit Analytics ● Developing an Audit Analytics Program ● An Audit Analytics Example
  • 33.
  • 34.
    Thank you!LinkedIn Email: andrew@block.science Phone:423-504-5024 Personal website BlockScience website
  • 35.
    BlockScience • BlockScience isan R&D, engineering design, and analytics firm focused on business model innovation in an increasingly networked society. We define and solve business challenges by leveraging AI, network science and other mathematical engineering technologies; we remain human-centered while adhering to time-tested systems engineering practices.