In the last decade, we have witnessed an exponential growth in both the complexity and the number of Machine Learning (ML) techniques. As a consequence, leveraging such methods to solve real-case problems has become difficult for a Data Scientist (DS). Automated Machine Learning (AutoML) tools were devised to alleviate that task, but easily became as complex as the ML techniques themselves. The DS has started to rely on this kind of tools without understanding their functioning, thus loosing the control over the process.
In this vision paper, we propose HAMLET (Human-centric AutoMl via Logic and Argumentation), a framework that would help the DS to redeem her centrality. HAMLET is inspired to the well-known standard process model CRISP-DM. Iteration after iteration, the knowledge is augmented by acquiring more constraints about the problem until a suitable solution is found. HAMLET leverages Logic and Argumentation to merge both constraints and solutions in an uniformed human-and machine-readable medium. Not only it allows an easy exploration of the new knowledge at each iteration, but it also enforces a continuous revision via the AutoML tool and the confrontation between the DS and Domain Experts.
Graduate Outcomes Presentation Slides - English (v3).pptx
Towards Human-centric AutoML via Logic and Argumentation
1. Towards Human-centric AutoML via Logic and
Argumentation
DATAPLAT 2022
Joseph Giovanelli, Giuseppe Pisano
University of Bologna - Alma Mater Studiorum
2. The role of Machine Learning in Data Platforms
1
3. Finding a solution in Machine Learning tasks
Instantiate a ML pipeline encompasses that:
• at each step, a technique must be selected;
• for each technique, a set of hyper-parameters must be set;
• each hyper-parameter has its own search space.
2
4. CRISP-DM: Cross Industry Standard Process for Data Mining
CRISP-DM enables the exploration
• domain-related;
• transformation-related;
• cross-cutting (e.g., ethical, legal).
of ML Constraints:
• data-related;
• algorithm-related;
3
5. CRISP-DM: Cross Industry Standard Process for Data Mining
CRISP-DM enables the exploration
• domain-related;
• transformation-related;
• cross-cutting (e.g., ethical, legal).
of ML Constraints:
• data-related;
• algorithm-related;
3
6. CRISP-DM: Cross Industry Standard Process for Data Mining
CRISP-DM enables the exploration
• domain-related;
• transformation-related;
• cross-cutting (e.g., ethical, legal).
of ML Constraints:
• data-related;
• algorithm-related;
3
7. CRISP-DM: Cross Industry Standard Process for Data Mining
CRISP-DM enables the exploration
• domain-related;
• transformation-related;
• cross-cutting (e.g., ethical, legal).
of ML Constraints:
• data-related;
• algorithm-related;
3
8. CRISP-DM: Cross Industry Standard Process for Data Mining
CRISP-DM enables the exploration
• domain-related;
• transformation-related;
• cross-cutting (e.g., ethical, legal).
of ML Constraints:
• data-related;
• algorithm-related;
3
9. CRISP-DM: Cross Industry Standard Process for Data Mining
CRISP-DM enables the exploration
• domain-related;
• transformation-related;
• cross-cutting (e.g., ethical, legal).
of ML Constraints:
• data-related;
• algorithm-related;
3
10. CRISP-DM: Cross Industry Standard Process for Data Mining
CRISP-DM enables the exploration
• domain-related;
• transformation-related;
• cross-cutting (e.g., ethical, legal).
of ML Constraints:
• data-related;
• algorithm-related;
3
11. CRISP-DM: Cross Industry Standard Process for Data Mining
CRISP-DM enables the exploration
• domain-related;
• transformation-related;
• cross-cutting (e.g., ethical, legal).
of ML Constraints:
• data-related;
• algorithm-related;
3
12. AutoML
AutoML aims at automating the ML pipeline instantiation:
• it is difficult to consider all the constraints together;
• it is not transparent;
• it doesn’t allow a proper knowledge augmentation. 4
13. HAMLET: Human-centric AutoML via Logic and Argumentation
HAMLET leverages :
• Logic to give a structure to the knowledge;
• Argumentation to deal with inconsistencies,
and revise the results.
5
14. HAMLET - The LogicalKB
The LogicalKB provides to:
• the Data Scientist to structure the ML constraints ;
• the AutoML tool to encode the explored results;
6
15. HAMLET - The Problem Graph
The Problem Graph allows to:
• consider all the ML constraints together;
• sets up the AutoML search space;
• discuss and argument about the results.
7
16. HAMLET - Usage example
Tranformations:
• Discretisation (D);
• Normalisation (N).
ML algorithm:
• Decision Tree (DT ).
We have 5 possible ML pipelines:
1. DT
2. D → DT
3. N → DT
4. D →N → DT
5. N →D → DT
8
17. HAMLET - Usage example
Tranformations:
• Discretisation (D);
• Normalisation (N).
ML algorithm:
• Decision Tree (DT ).
ML constraints:
• algorithm-related: “require D when applying DT ”;
The algorithm-related constraints discard the 1st and 3rd ML pipelines:
1. DT 7
2. D → DT
3. N → DT 7
4. D →N → DT
5. N →D → DT
8
18. HAMLET - Usage example
Tranformations:
• Discretisation (D);
• Normalisation (N).
ML algorithm:
• Decision Tree (DT ).
ML constraints:
• algorithm-related: “require D when applying DT ”;
• transformation-related: “no N in pipelines with D”.
The transformation-related constraints discard the 4th and 5th ML pipelines:
1. DT 7
2. D → DT
3. N → DT 7
4. D →N → DT 7
5. N →D → DT 7
8
19. HAMLET - Usage example
Tranformations:
• Discretisation (D);
• Normalisation (N).
ML algorithm:
• Decision Tree (DT ).
ML constraints:
• algorithm-related: “require D when applying DT ”;
• transformation-related: “no N in pipelines with D”.
The only valid ML pipeline is the 2nd:
1. DT 7
2. D → DT 3
3. N → DT 7
4. D →N → DT 7
5. N →D → DT 7
8
20. HAMLET - Usage example1
Tranformations:
• Discretisation (D);
• Normalisation (N).
ML algorithm:
• Decision Tree (DT ).
ML constraints:
• algorithm-related: “require D when applying DT ”;
• transformation-related: “no N in pipelines with D”.
LogicalKB: Problem Graph:
1https://queueinc.github.io/HAMLET-DATAPLAT2022/ 9
21. Conclusions and future works
Contributions:
• structure the ML constraints and the AutoML solutions in a LogicalKB;
• parse the structured LogicalKB into a human- and machine-readable
medium called Problem Graph;
• leverage the Problem Graph to set up an AutoML search space;
• leverage the Problem Graph to allow both the DS and an AutoML tool to
revise the current knowledge.
Future works
• sound formalisation;
• implementation;
• test efficacy and benefits on real-case problems.
10