Enterprise Machine Learning Governance

Enterprise Machine Learning
Governance
7th August 2019

About me: Terence Siganakis
• BSc (Computer Science) / MSc (Bioinformatics)
• Former: Cancer research at Peter Mac (Genome analysis)
• Former: CTO of Gooroo (ASX: GOO)
• Current: CEO of Growing Data
• Data Science / Engineering consultancy
• 20 Data Scientists, Software Engineers
• Work with Enterprises like ANZ, CSL, DHHS, Metricon

Enterprise Machine Learning Governance
• Governance & why it matters
• Machine Learning in the enterprise
• Well Governed Machine Learning
• Machine Learning Governance Architecture

Governance
Governance encompasses the system by which an organisation
is controlled and operates, and the mechanisms by which it, and
its people, are held to account.
Governance Institute of Australia

Data Governance: Why bother?
• Reputation
• Data breaches
• Legislation
• Privacy Act
• GDPR, Sarbanes-Oxley
• Regulation
• APRA, ASIC, etc

ML Governance: Why bother?
It’s the price of admission into
meaningful problems
(& it makes development easier and faster)

Machine Learning
in the Enterprise

Machine Learning → Decisions
• Predictions relate to decisions
• What movie should I recommend? (Video store clerk)
• What should their credit rating be? (Credit analyst)
• Often these decisions were made by people you could
train and (if need be) fire.
• Someone owns the risk & is accountable
• ML Lets us make decisions at scale
• Based on more data, much faster (larger impact of failures)
• What happens when it all goes wrong?
• Who owns the risk?
• Impact on future projects / buy in for ML generally

Machine Learning Risks
• In-accurate predictions
• Poorly performing models when deployed
• “Good” predictions gone bad
• Models that perform well, but are
problematic
• Bias based on protected features
• Sexist, Racist
• Feedback loops
• Predictions based on bias re-enforces bias

Enterprise Machine Learning
• Enterprises are risk averse
• More to lose
• Regulatory risk, Reputational risk, Financial risk
• Enterprises have more checks and balances
• More people to convince the solution works
• More people to convince the solution won’t break
• More people to convince the solution won’t get them fired!
• Who owns the risk?
• Who gets fired if there is a problem?

Well Governed
Machine Learning

Goals for ML Governance:
Deployments should be:
• Testable
• Reliable
• Monitored
Training should be:
• Reproducible
• Traceable
• Explainable
• Documented

Reproducible
It should be possible to easily regenerate a model and its
predictions from the same source data
• “But it works on my laptop” is never acceptable
• Track source code versions (git SHA), package versions using Docker
• Track the data that was used to train the model
• Requires storing a lot of “versioned” data
• Store random seeds!
It makes debugging easier! (I can reproduce the broken model, locally)

Traceable
It should be possible to track the origins of data, and all the
processing steps which have been applied to the data.
• Micro-service based architectures lead to large numbers of data silos
• Data Pipelines / ETL is increasingly complex
• Broken pipelines lead to Broken models
• Identifying the origin of data related errors is extremely time consuming
It makes debugging easier! (I can track issues back to systems)

Explainable
It should be easy to understand why the model made a
certain prediction
• Management & Regulators are often not trusting people
• Visualization makes it easier to communicate why a new model is better
than an old one
• Ensure visualizations are outputs of model training
It makes debugging easier! (I can see where it went wrong)

Documented
ML Models are inherently complex and need to be
documented so that they can be understood by others.
• Compliance: Document that that the process is being followed
• Education: A colleague should be productive relying only on docs
• Versioned: Documentation needs to be living (and history is useful)
• Service Level Objectives
• How long should it take to train?
• How long should predictions take?
• What level of accuracy, across what measures is acceptable?
• Service Level Indicators
• Metrics for whether or not we are meeting SLO’s
It makes debugging easier! (I can get up to speed faster)

Testable
Each machine learning model should have a suite of tests,
ensuring that it not only scores well, but is resistant to bias
and can handle extreme values
• Performance testing against truly unseen data
• Validation of inputs
• Checks to ensure they are not biased
• Checks to prevent feedback loops
It makes debugging easier! (I can isolate errors, prevent regressions)

Reliable
Predictions should be reliable in the face of unseen data,
even where the unseen data is hostile
• ML Models may handle unseen / improbable data poorly (Black Swans)
• Consider adversarial examples which may lead to poor predictions
• The envelope of reliability should be well understood, and predictions out
side of it should be avoided
Fewer edge cases to consider! (Predictions are only made when confident)

Monitored
Machine Learning models need to be monitored to ensure
that their performance meets expectations
• The performance of models will change over time
• Have usage patterns / source data changed?
• Has a dependent system changed?
• Has data processing changed?
It makes debugging easier! (I can see when the model broke and why)

Machine Learning
Governance
Architecture

Architecture: Train, Tune & Test

Architecture: Deploy & Monitor

Improved Governance is a Journey
• Improve governance with each release
• Create more controls
• Create more test cases
• Improve monitoring
• Improve process compliance

Thank you!
Please reach out to me for a coffee if you would like
to discuss further
• terence@growingdata.com.au
• https://growingdata.com.au

Enterprise Machine Learning Governance

Recommended

Recommended

More Related Content

Similar to Enterprise Machine Learning Governance

Similar to Enterprise Machine Learning Governance (20)

Recently uploaded

Recently uploaded (20)

Enterprise Machine Learning Governance