Machine Learning for Auditors: What you need to know - ISACA North America CACS 2017

Copyright © 2017 Information Systems Audit and Control Association, Inc. All rights reserved.
Andrew Clark, IT Auditor / Data Scientist
Astec Industries, Inc., M.S. Data Science Candidate

Overview
• What is machine learning?
• Why is it important?
• What do all of the buzzwords mean?
• Non-technical introduction
• What are the two broad types of machine learning?
• How does it pertain to auditors?
• Case studies
• What would a machine learning audit entail?
• Where can I learn more about machine learning?
Kong, Qingkai . "Machine Learning 1 - What is machine learning and real world example." Qingkai's Blog (web
log), October 4, 2016. Accessed February 21, 2017. http://qingkaikong.blogspot.com/2016/10/machine-learning-
1-what-is-machine.html?showComment=1484689212391#c4748865641151946089.

What is Machine Learning?
A computer recognizing patterns without having to be explicitly programed.

Why is Machine Learning Important?
• Disrupting business. Example ML powered businesses disrupted
Blockbuster, Taxis, etc.
• Revolutionizing existing business models. Predictive maintenance, retailing,
credit card fraud detection.
• One of the key technologies in driving economic growth

What Machine Learning is not:
• Magic
• Going to take your job (for the majority of professionals)
• Always the best tool for the job

What do all the buzzwords mean?
• Machine Learning based artificial intelligent - Big Data spewing - Deep
Learning - Neural Network touting - Cognitive Computing - Virtual Reality -
Natural Language Processing - Chat Bot.

A non-technical introduction
• Process, when strung together, called a pipeline
• Business Understanding
• Data Understanding
• Data Preparation
• Modeling
• Evaluation
• Deployment
Kearn, Martin . "Machine Learning is for Muggles too!" Microsoft Developer (web log), March 1, 2016. Accessed February
21, 2017. https://blogs.msdn.microsoft.com/martinkearn/2016/03/01/machine-learning-is-for-muggles-too/.

Business Understanding
• The most important step
• ‘The why’
• Why is this needed and what is the desired outcome

Data Understanding
• An understanding of where the data is coming from is key to good modeling
• SQL relational database? NoSQL database? Csv, txt, webpage, Tweets?
• What scale is the data on? For example, Celsius or Fahrenheit?
• Is the scale the same on all data streams or will transformations be
required?

Data Preparation
• Currently, close to 90% of what Data Scientists do
• ‘Munging’
• “I’m a data janitor. That’s the sexiest job of the 21st century. It’s very
flattering, but it’s also a little baffling.” – Josh Wills
• Press, Gil. "Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says." Forbes. March 23, 2016. Accessed March
13, 2017. https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-
says/#21e789136f63.

Modeling
"Choosing the right estimator." Choosing the right estimator —
scikit-learn 0.18.1 documentation. Accessed March 13, 2017.
http://scikit-
learn.org/stable/tutorial/machine_learning_map/index.html.

Evaluation
• Accuracy
• Precision
• Recall
• Does the model solve the problem?

Deployment
• Integrated into existing infrastructure or application?
• Separate web application?
• Scheduled job?
• Run adhoc?

Unsupervised Machine Learning
• Given some cleaned data, the algorithm, a series of instructions, divides the
data into like groups.
• Popular models:
– Kmeans
– KNN (K-nearest neighbors)

Supervised Machine Learning
• Given a labeled dataset, ‘fraud not fraud’, the algorithm is ‘trained’, to
recognize which items are fraud and which items are not fraud.
• Common techniques include:
– Logistic Regression
– Support Vector Machines

Example, Logistic Regression
from sklearn.linear_model import LogisticRegression
LogR = LogisticRegression()
# [height, weight, shoe_size]
X = [[181, 80, 44], [177, 70, 43], [160, 60, 38], [154, 54, 37], [166, 65, 40], [190, 90, 47], [175, 64, 39],
[177, 70, 40], [159, 55, 37], [171, 75, 42], [181, 85, 43]]
Y = ['male', 'male', 'female', 'female', 'male', 'male', 'female', 'female', 'female', 'male', 'male']
LogR.fit(X, Y)
prediction = LogR .predict([[190, 70, 43]])
print prediction
>>[‘female’]
https://github.com/aclarkData/simple-machine-learning-examples/blob/master/very_simple_examples/logistic_regression.py

Example, Kmeans
• Clustering journal entries
• Essentially, we obtain a month, or any time period, of journal entries, “one-
hot encode” (convert to binary, i.e. 0,1) the non-numerical columns (which
essentially means convert ‘Hello’ into a series of 0s and 1s, and group
together in a pre-determined set of groups, for example, 3.

Kmeans continued
http://blog.mpacula.com/2011/04/27/k-means-clustering-example-python/

As an auditor, what does this mean for you?
• New opportunities and risks
• Catch-22 of businesses accepting the risk of black boxes or becoming
irrelevant
• Use cases in audit analytic
• More complicated environment, new skills required to understand business
implications and audit algorithms

Use cases in Assurance and Compliance
• Anomaly detection
– Unsupervised journal entry anomaly detection
– Clustering on invoice and AP data for outliers
• ‘Auditor sense’ investigation
– Supervised model for expense report investigation
– Supervised model for journal entries
– AP transactions, customer transactions, etc.

The Machine Learning Algorithm Audit
• With algorithms increasingly dictating our lives, how do we know that they
are operating as intended?
– e.x. Weapons of Math Destruction by Cathy O'Neil
• Unfilled role for assurance professionals.
– Review assumptions, and when available, such as decision tree, logistic regression,
etc, look at the weighting for features in the model.
– Can provide a lot of value with using only SDLC audit methodologies

Machine Learning Audit Example – Logistic Regression
>>weights = pd.Series(clf.coef_[0], index=ShoeData.columns)
>>weights
Height -0.439204
Weight 0.622762
Shoe_size 0.829036
>>weights.plot(kind='bar’, title =‘ …’)
https://github.com/aclarkData/simple-machine-learning-examples
/blob/master/very_simple_examples/BasicMachineLearning.ipynb

Machine Learning Audit Example – Decision Tree
Classifier
>>from sklearn import tree
>>clf = tree.DecisionTreeClassifier()
>>clf.fit(X, Y)
>>prediction = clf.predict([[190, 70, 43]])
>>print prediction
[u'male']
>>dot_data = tree.export_graphviz(clf, feature_names=ShoeData.columns, class_names = ShoeData.columns,
out_file='tree.dot')
https://github.com/aclarkData/simple-machine-learning-examples
/blob/master/very_simple_examples/BasicMachineLearning.ipynb

http://www.webgraphviz.com/

EU’s General Data Protection Regulation (GDPR)
• In April 2016, the EU passed General Data Protection Regulation act, which
gives citizens a right explanation for citizens and regulators regarding
algorithmic decision making.
• Empowers citizens with the ability to understand why they were rejected for
a bank loan, for instance, when the decision was based off an algorithm.

Where can I learn more about Machine Learning?
• -Visual Intro, highly recommended, short and sweet
• http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
• -Wikipedia
• https://en.wikipedia.org/wiki/Machine_learning
• -Good beginning article with some fantastic books
• http://machinelearningmastery.com/4-steps-to-get-started-in-machine-
learning/
• -Weka
• http://www.cs.waikato.ac.nz/ml/weka/
• Scikit-Learn
• http://scikit-learn.org/

Conclusion and recap
• Definition of Machine Learning
• Buzzword breakdown
• Machine Learning process
• Broad algorithm overview
• Real world use cases
• The Machine Learning Audit
• Where to learn more about Machine Learning

Thank you!
• Email: andrewtaylorclark@gmail.com
• GitHub: aclarkData
• Blog: https://aclarkdata.github.io/
• LinkedIn: www.linkedin.com/in/andrew-clark-b326b767

Machine Learning for Auditors: What you need to know - ISACA North America CACS 2017

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Machine Learning for Auditors: What you need to know - ISACA North America CACS 2017

Similar to Machine Learning for Auditors: What you need to know - ISACA North America CACS 2017 (20)

More from Andrew Clark

More from Andrew Clark (8)

Recently uploaded

Recently uploaded (20)

Machine Learning for Auditors: What you need to know - ISACA North America CACS 2017