Smart modeling of smart software

Smart Modeling of smart software
@JordiCabot / jordicabot.com / modeling-languages.com
Jordi Cabot

SOM Research Lab
Software runs the world. Models run the software

Our mission
We are interested in the broad area of
systems and software
engineering, especially
promoting the rigorous use of
models and
engineering principles
while keeping an eye on the most
unpredictable element in any project:
the people.
Flickr/clement127

To model, or not to model, this is the WRONG
question
- Shakespeare

Hidden Technical debt in Machine Learning Systems - Google -
https://proceedings.neurips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a
2463eba-Paper.pdf

Smart / AI-enhanced / ML-enabled software
Trad
Software
AI
component

AI components are mostly software
Input/Output
Messaging channels
External
Platforms
NLU Engine
Events!

How to develop better smart software
faster?

• Grady Booch – history of softwre engineering
The entire history of software engineering is that of
the rise in levels of abstraction
- Grady Booch

To generate better Smart software
faster we should
(Semi)automate
the generation of
Smart softwtare
Better models for
Smart software
Better ways to
create such
models

In most of my papers in the last 3 years

Models
A social artifact that is acknowledged by an observer to
represent an abstraction of some domain for a particular
purpose. – E. Proper & G. Guizzardi
Everything is a model – J. Bézivin

• For instance, you can see them as model transformations that generate
output data from their input
• Many ML operations (e.g. fine tuning) can be mapped to “classical”
model manipulation operations such as model refining
ML Models are models
If we agree on this, then it’s completely natural to manipulate all kinds of AI models and
model manipulation operations using (extensions of) existing software modeling
techniques

All major vendors are “going modeling”

Keras can be considered a modeling
framework for NN

Modeling
Smart software
Modeling
software

One app -> Multiple (smart) interfaces

E Planas, G Daniel, M Brambilla, J Cabot:
Towards a model-driven approach for multiexperience AI-based

Xatkit Domain Specific Language
• Bots are created with the Xatkit DSL offering a bot-specific
syntax for creating:
– Intents the bot needs to match
– The behavior to execute in response to the matched intents
• The language is designed to easily integrate and combine
all types of IUIs
https://github.com/xatkit-bot-platform

Our DSL is implemented as a Java Fluent API
• Create bots using your preferred Java editor
– Benefit from all existing Java tooling when developing bots (e.g. debuggers)
– Reuse any Java library for complex bot behaviors
– Intuitive Fluent Interfaces to help you create advanced conversations
– Based on state machine semantics to build any type of bot
• (we did try first to implement our DSL as an external DSL but we were
reinventing the wheel)

Execution DSL example – State
machines

Xatkit is available on GitHub
https://github.com/xatkit-bot-platform

BETTER (META)DATA
FOR BETTER ML ->
DATA CAN BE A SOURCE
OF BIASES
A dataset with limited
represenativeness of skin colors
could fail to recognize people
E.g.1 Facial analysis
NLP: Different accents
A dataset limited to Australian voice
records may fail when trying to
understand British speakers

Annotating datasets
• Some ongoing concerns and proposals from
the AI community to clarify the
representativeness of the data, its provenance,
possible social issues,…
– E.g. Datasheets for datasets – Gebru et al, CACM
• But there are currently no industry standards
for documenting ML datasets

DescribeML
• A DSL to describe ML datasets
• Collecing and “unifying” different templates
and guidelines from the ML community
A domain-specific language for describing machine
learning datasets, arXiv:2207.02848
Joan Giner-Miguelez, Abel Gómez, Jordi Cabot
https://github.com/SOM-Research/DescribeML

DescribeML
Tool
• Plug-in for VSCode
• Implemented with Langium

+APPLICATIONS
Searching and comparing datasets
based on your concrete needs
Replication of ML experiments when
the original data is not available

Modeling development processes
for smart software projects

Motivation
Development of modern applications –which include AI
components as core logic– requires multidisciplinary teams
with diverse skill sets: software engineers, data scientists,
psychologists, AI experts…
Diversity may lead to communication issues or misapplication
of best practices.
There is a need for more support and guidance when
developing AI-based software.

Process modeling
A process model provides full visibility and traceability
about:
● the work decomposition within an organization
● the responsibilities of their participants
● the standards and knowledge it is based on
Process models are guidelines for configuration,
execution and continuous improvement.

Proposal
Our DSL:
● does not prescribe any concrete AI engineering process model
● offers the modeling constructs to easily define own processes
Expected benefits:
● Design, enactment, automation and monitoring of AI processes
● Detect hidden or conflicting practices
● Simplify the onboarding of new team members
We propose a domain-specific language (DSL) to facilitate
the modeling of AI engineering processes,
based on the analysis of research and gray literature.

Modeling Copilot - Easing
modeling

But we don’t have the data 
José Antonio Hernández López, Javier Luis Cánovas
Izquierdo, Jesús Sánchez Cuadrado:
ModelSet: a dataset for machine learning in model-
driven engineering. Softw. Syst. Model. 21(3): 967-986
(2022)
Jordi Cabot, David Delgado and Lola
Burgueño. Combining OCL and Natural Language: a Call
for a Community Effort

55
Natural
Language
Processing
Domain Modeling
assist
Natural Language Processing (NLP) for model
autocompletion
Loli Burgueño, Robert Clarisó Sébastien Gérard, Shuai Li, Jordi Cabot :
An NLP-Based Architecture for the Autocompletion of Partial Domain Models. CAiSE 2021: 91-106

Our approach
56
Partial
model
Text preprocessing
algorithm
preprocess
A.1
Domain corpus
of text
Domain
docs NLP method for
word embeddings
train
A.2
Morphological
analysis &
lemmatization
NLP models
contextual
knowledge
general
knowledge
NLP components

Our approach
57
Model Recommendation Engine
NLP method for
word embeddings
train
A.2
Morphological
analysis &
lemmatization
NLP models
contextual
knowledge
general
knowledge
Partial
model
Text preprocessing
algorithm
preprocess
A.1
Domain corpus
of text
Domain
docs

Our approach
58
NLP method for
word embeddings
train
A.2
Morphological
analysis &
lemmatization
NLP models
contextual
knowledge
general
knowledge query
B.2
B.1
B.2
B.3
B.4
B.5
C.2
uses
uses
Partial
model
Text preprocessing
algorithm
preprocess
A.1
Domain corpus
of text
Domain
docs
Add class named Plane
Add class named Airline
Add class named Airplane
…

Our approach
59
NLP method for
word embeddings
train
A.2
Morphological
analysis &
lemmatization
NLP models
contextual
knowledge
general
knowledge
update
C.1
query
B.2
B.1
B.2
B.3
B.4
C.2
uses
uses
Partial
model
Text preprocessing
algorithm
preprocess
A.1
Domain corpus
of text
Domain
docs
B.5
Add class named Plane
Add class named Airline
Add class named Airplane
…

Inferring models from your data
• There is a hidden model in your database, in
your API or web forms
• Or your CSVs
• Even in your docs, manuals and regulations
How far can we push AI technologies (e.g. pretrained language models) to make sense of
the implicit models behind all these data?

Bots for Open Data project
• Empowering citizens to benefit from open data sources
(>1.4M in the European data portal)
• Partial Bot models are generated from the CSV/JSON file
– E.g. checking the type of the columns we can generate obvious
questions (max, avg, min for numeric ones)
– Ontologies could be used to package more semantic libraries of
questions
– TextToSQL used as default fallback

LOCOSS project
DevOps for Smart
software (e.g. keep track
of the training data)

Augme
nted
reality
Emotion
-aware
UI
Security
aspects
DSL for
prompts

Uncertai
nty in ML
Leverage
semantics
Testing
AI/non-AI
interact
Modeling
ethical
reqs

jordi.cabot@icrea.cat
@JordiCabot
jordicabot.com
Let’s keep the discussing
going!
Looking for PhDs and Postdocs in Barcelona

Smart modeling of smart software

More Related Content

Similar to Smart modeling of smart software

More from Jordi Cabot

Recently uploaded

Smart modeling of smart software

Editor's Notes