The document proposes a unified meta-model (EMMM) to track machine learning experiments across different tools. It analyzed existing ML experiment management tools to extract common asset types and relationships. The meta-model was designed in three phases and validated with example experiments. EMMM is formalized using Ecore and can enable interoperability between tools by providing a common representation. The meta-model could also serve as a blueprint for developing new ML experiment tools and connecting ML asset management to model-driven engineering practices. Future work includes extending EMMM's configurability and unifying more proposed academic tools.
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
EMMM: A Unified Meta-Model for ML Experiments
1. EMMM: A Unified Meta-Model for
Tracking Machine Learning Experiments
Samuel Idowu, Daniel Strüber, and Thorsten Berger
2. 2021-01-20
Introduction
ML-based software
systems
Vs.
Traditional Software
systems
ML experiments
F. Kumeno, “Sofware engineering challenges for machine learning applications: A literature review,” Intell. Decis. Technol., vol. 13, 2020
A. Arpteg, B. Brinne, L. Crnkovic-Friis, and J. Bosch, “Software Engineering Challenges of Deep Learning,” in 2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), 2018
C. Hill, R. Bellamy, T. Erickson, and M. Burnett, “Trials and tribulations of developers of intelligent systems: A field study,” in 2016 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), 2016,
3. 2021-01-20
Introduction
Characteristics
Asset Management
Approaches
★ Non-Linear
★ Trial and error
★ Exploratory & intuitive-based
★ Generates multiple asset versions
★ Level 1: Use of ad hoc approaches, e.g.,
dedicated naming conventions for folders and
files
★ Level 2: Use of Git / VCSs and dedicated
databases
★ Level 3: ML experiment management tools
ML experiments
4. 2021-01-20
Experiment management Tools
Specialized tools for managing
ML-specific assets such as features,
hyperparameters, models and
evaluation metrics
★ Examples:
○ MLFlow, Neptune, DVC
★ Systematic approach to manage ML asset
version
★ Supports various ML experiment concerns
○ E.g., Reproducibility, traceability,
reusability
5. 2021-01-20
Motivation & Goals
Existing tools are not fully matured
to support large scale ML-based SW
development
★ Most of the tools currently target data scientists
★ Less focus on collaboration
★ Current operations for tracked data and assets are very
basic
★ Lack of interoperability among existing tools
★ Lack of integration with established SE tools
★ Establish a unified blueprint of core structures and
relationship in existing tools
★ Useful for tool developers and researchers
★ Towards domain specific operations for ML assets.
Unified and effective ML experiment
management tools integrated with traditional
SW engineering tools such as IDEs, and VCS.
Long-term Goal
Challenge
6. 2021-01-20
Methods
★ Explored the versioning support offered by a number of
experiment management tools.
★ Observed and extracted the ML asset types (structures) they
support and their versioning relationships.
★ We then unified their conceptual structures and relationships
using a meta-model
★ Domain modeling in three phases
Initial design of the meta-model to
establish classes and their
relationships
Refinement of structure and the class
relationships through iterative process
Validation phase: Create instances of
concrete experiments with their revision
histories to reveal design flaws and identify
improvement opportunities
Idowu, S., Strüber, D., & Berger, T. (2021, May). Asset management in machine learning: a survey. In 2021 IEEE/ACM 43rd International Conference on Software Engineering:
Software Engineering in Practice (ICSE-SEIP) (pp. 51-60). IEEE.
7. 2021-01-20
Result - EMMM
★ Ready-to-use software artifact, formalized in Ecore,
★ Usable to facilitate tool development.
★ New experiment instances can be created and manipulated
via meta-model’s EMF-generated code, and its APIs.
Idowu, S., Strüber, D., & Berger, T. (2021, May). Asset management in machine learning: a survey. In 2021 IEEE/ACM 43rd International Conference on Software Engineering:
Software Engineering in Practice (ICSE-SEIP) (pp. 51-60). IEEE.
8. 2021-01-20
Result - EMMM
★ Ready-to-use software artifact, formalized in Ecore,
★ Usable to facilitate tool development.
★ New experiment instances can be created and manipulated
via meta-model’s EMF-generated code, and its APIs.
Idowu, S., Strüber, D., & Berger, T. (2021, May). Asset management in machine learning: a survey. In 2021 IEEE/ACM 43rd International Conference on Software Engineering:
Software Engineering in Practice (ICSE-SEIP) (pp. 51-60). IEEE.
9. 2021-01-20
What’s next?
Use cases:
★ Enabling interoperability: Tool developers can write import
and export functions towards our meta-model
★ Blueprint for developing new tools: Developers of
tool/extensions could represent ML-specific information of a
revision history as instances of our meta-model.
Future work:
★ Extend the metamodel to make it configurable
○ Not all valid uses require the support of the meta-model
in its entirety. Hence, it might be desirable that new tools
implement support for a subset of the meta-model
based on their specific needs.
★ Unifying additional proposed tools from academic research
★ Connecting to available MDE tools and services.
○ We make a plethora of MDE work applicable to a new
context in machine learning, e.g., tools for model
analysis, simulation, refactoring, quality assurance,
testing, and many others.