by Joakim Sandroos, Senior Data Scientist at Danish Business Authority
At the Danish Business Authority (DBA), machine learning (ML) is utilized in the role of decision support. In order to build ethical ML on a solid scientific understanding, explainability and traceability are mission critical. DBA utilizes an in-house developed Directed Acyclic Graph (DAG) tool, RecordKeeper, to preserve causality information between business events on their platform. Via flow analysis, they identify Springs and Sinks in their dataset to mitigate overall model bias.
2. Danish Business Authority (DBA)
●
Business registrations
●
Central Business Registry (CVR)
●
Fiscal report audits
●
Business support schemes
(eg: Covid-19, IT-security...etc)
●
Legal oversight & control
ML-Lab IKP
3. 3
Why Machine Learning
• Make it easy to be a law-abiding company
AND: Make it hard to swindle
●
~800.000 companies in Denmark – impossible to check
everything by hand
• Focus efforts where most needed
• Requires data, infrastructure and software tools
4. 4
Intelligent Kubernetes Platform
Four Main Components:
●
Kubernilla: Vanilla version of Kubernetes, highly opinionated
●
RaceTrack: Deployment system (www.github.com/theracetrack)
●
CatWalk: Evaluation component
●
RecordKeeper: Platform wide system event logger
Plus: Data Warehouse (postgreSQL, Neo4J)
Development:
Idempotent system design
Infrastructure as Code
One source of truth
5. Knowledge Graph (postgreSQL → Neo4j)
●
CVR (Businesses, people,
addresses … etc)
●
DBA Cases
●
Fiscal Reports
...and much more ...
●
Labels: 50
●
Relationship types: 41
●
Node Properties: 237
●
Nodes: 445 mio
●
Edges: 688 mio
→ Forms basis for ML efforts
16. 16
Pitfalls
●
ML: It is easy to do something:
→ but also extremely easy to do
it wrong
●
Any ML model reflects its training
data
●
ML is only as strong as the data
17. 17
Doing it wrong: Unethical AI
United States: Repeat criminal offenders
●
Guided prison sentence lengths
●
Biased towards colored people
Netherlands: Child care benefits fraud
●
10.000s families effected
●
Many low-income families
●
Many pushed into poverty
●
Several suicides
●
Government resigned
18. 18
Motivation / Bias
●
Build fair & ethical models
●
EU: Artificial Intelligence Act
(https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A52021PC0206)
●
EU: GDPR
(https://eur-lex.europa.eu/eli/reg/2016/679/oj)
Data ‘landscape’
Used data
known unknown
Unknown unknown
19. 19
Motivation / Bias
●
Build fair & ethical models
●
EU: Artificial Intelligence Act
(https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A52021PC0206)
●
EU: GDPR
(https://eur-lex.europa.eu/eli/reg/2016/679/oj)
●
Challenge: Follow data trail, explain origin of
knowledge and conclusions
●
Our Answers: RecordKeeper & X-Rai framework
[Transparent, Responsible, Explainable AI:
https://pure.itu.dk/en/publications/x-rai-a-framework-for-the-transparent-responsible-and-accurate-us]
Data ‘landscape’
Used data
known population
Un-known population
20. 20
ML at the Business Authority
●
Need for complete traceability
Traceability need
24. 24
RecordKeeper: System Event Logger
●
Server / Client system, Python
●
Passive component: Listening only
●
Platform Event Message (PEM):
●
One action on the cluster, Unique ID
●
Emitter ID
●
Predecessor ID known
●
Artifacts: Data references
●
Builds graph of PEMs and Artifacts
-> Facilitates explainability on the cluster
25. 25
PEM Directed Acyclic Graphs (DAG)
●
Each event creates a PEM
●
PEMs can create or reference artifacts
Data ingest
Data
Warehouse
Model
Training
PEM
1
PEM
2
PEM
3
Components:
(Emitters)
DAG:
Artifacts: References Main knowledge Graph
27. 27
Flow Networks
●
Edges as ‘action paths’
●
Probability representations
●
Inspired by Bangio et al.: [https://arxiv.org/abs/2106.04399v2]
[Flow software package: https://github.com/GFNOrg/gflownet]
28. 28
●
Trace out data usage
●
PageRank for node importance
●
Bias Detection
– at training and runtime
– sink scores
Explainability & Bias detection
user
ML-models
data
Ss=∑ F(s ,a')−∑ F(s ,a)
29. 29
●
Trace out data usage
●
PageRank for node importance
●
Bias Detection
– at training and runtime
– sink scores
●
Data driven insights for
explainability,
model retirement or
re-training
Explainability & Bias detection
service
Consumer
ML-models
data
ML-score
ML-score
30. 30
●
Reward: ML-Score
●
Train Graph Neural Network
●
Learn flow structure
●
Meta Tensor Model across
data, actions and scores
Idea: Meta Model
user
ML-models
data
ML-score
ML-score
31. 31
Closing Remarks
●
Knowledge Graphs facilitate ML-efforts at Danish Business Authority
●
Focus on Transparent, Responsible and Explainable AI (X-Rai)
●
RecordKeeper generates Causal knowledge graphs
(explainability, bias mitigation, Flow tensor models)
Open Sourcing main components
RaceTrack, adaptable launch system already publicly available at:
http:github.com/theracetrack