Smart Modeling of smart software
@JordiCabot / jordicabot.com / modeling-languages.com
Jordi Cabot
About
SOM Research Lab
Software runs the world. Models run the software
Our mission
We are interested in the broad area of
systems and software
engineering, especially
promoting the rigorous use of
models and
engineering principles
while keeping an eye on the most
unpredictable element in any project:
the people.
Flickr/clement127
Warm-up
Modeling
To model, or not to model, this is the WRONG
question
- Shakespeare
Smart Software
Hidden Technical debt in Machine Learning Systems - Google -
https://proceedings.neurips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a
2463eba-Paper.pdf
Smart / AI-enhanced / ML-enabled software
Trad
Software
AI
component
AI components are mostly software
Input/Output
Messaging channels
External
Platforms
NLU Engine
Events!
How to develop better smart software
faster?
• Grady Booch – history of softwre engineering
The entire history of software engineering is that of
the rise in levels of abstraction
- Grady Booch
To generate better Smart software
faster we should
(Semi)automate
the generation of
Smart softwtare
Better models for
Smart software
Better ways to
create such
models
Modeling smart software
In most of my papers in the last 3 years
Models
A social artifact that is acknowledged by an observer to
represent an abstraction of some domain for a particular
purpose. – E. Proper & G. Guizzardi
Everything is a model – J. Bézivin
• For instance, you can see them as model transformations that generate
output data from their input
• Many ML operations (e.g. fine tuning) can be mapped to “classical”
model manipulation operations such as model refining
ML Models are models
If we agree on this, then it’s completely natural to manipulate all kinds of AI models and
model manipulation operations using (extensions of) existing software modeling
techniques
All major vendors are “going modeling”
Keras can be considered a modeling
framework for NN
Modeling
Smart software
Modeling
software
Modeling smart interfaces
One app -> Multiple (smart) interfaces
E Planas, G Daniel, M Brambilla, J Cabot:
Towards a model-driven approach for multiexperience AI-based
Intent package metamodel
Behavioural package metamodel
Runtime package metamodel
Xatkit Domain Specific Language
• Bots are created with the Xatkit DSL offering a bot-specific
syntax for creating:
– Intents the bot needs to match
– The behavior to execute in response to the matched intents
• The language is designed to easily integrate and combine
all types of IUIs
https://github.com/xatkit-bot-platform
Our DSL is implemented as a Java Fluent API
• Create bots using your preferred Java editor
– Benefit from all existing Java tooling when developing bots (e.g. debuggers)
– Reuse any Java library for complex bot behaviors
– Intuitive Fluent Interfaces to help you create advanced conversations
– Based on state machine semantics to build any type of bot
• (we did try first to implement our DSL as an external DSL but we were
reinventing the wheel)
Intent DSL example
Execution DSL example – State
machines
Xatkit is available on GitHub
https://github.com/xatkit-bot-platform
Modeling datasets
BETTER (META)DATA
FOR BETTER ML ->
DATA CAN BE A SOURCE
OF BIASES
A dataset with limited
represenativeness of skin colors
could fail to recognize people
E.g.1 Facial analysis
NLP: Different accents
A dataset limited to Australian voice
records may fail when trying to
understand British speakers
Annotating datasets
• Some ongoing concerns and proposals from
the AI community to clarify the
representativeness of the data, its provenance,
possible social issues,…
– E.g. Datasheets for datasets – Gebru et al, CACM
• But there are currently no industry standards
for documenting ML datasets
DescribeML
• A DSL to describe ML datasets
• Collecing and “unifying” different templates
and guidelines from the ML community
A domain-specific language for describing machine
learning datasets, arXiv:2207.02848
Joan Giner-Miguelez, Abel Gómez, Jordi Cabot
https://github.com/SOM-Research/DescribeML
DSL METAMODEL
DescribeML
Tool
• Plug-in for VSCode
• Implemented with Langium
+APPLICATIONS
Searching and comparing datasets
based on your concrete needs
Replication of ML experiments when
the original data is not available
Modeling development processes
for smart software projects
Motivation
Development of modern applications –which include AI
components as core logic– requires multidisciplinary teams
with diverse skill sets: software engineers, data scientists,
psychologists, AI experts…
Diversity may lead to communication issues or misapplication
of best practices.
There is a need for more support and guidance when
developing AI-based software.
Process modeling
A process model provides full visibility and traceability
about:
● the work decomposition within an organization
● the responsibilities of their participants
● the standards and knowledge it is based on
Process models are guidelines for configuration,
execution and continuous improvement.
Proposal
Our DSL:
● does not prescribe any concrete AI engineering process model
● offers the modeling constructs to easily define own processes
Expected benefits:
● Design, enactment, automation and monitoring of AI processes
● Detect hidden or conflicting practices
● Simplify the onboarding of new team members
We propose a domain-specific language (DSL) to facilitate
the modeling of AI engineering processes,
based on the analysis of research and gray literature.
Excerpt: AIModelingActivity
Supporting modeling tool
Smart Modeling
Modeling Copilot - Easing
modeling
Mendix Assist
OutSystems.ai
But we don’t have the data 
José Antonio Hernández López, Javier Luis Cánovas
Izquierdo, Jesús Sánchez Cuadrado:
ModelSet: a dataset for machine learning in model-
driven engineering. Softw. Syst. Model. 21(3): 967-986
(2022)
Jordi Cabot, David Delgado and Lola
Burgueño. Combining OCL and Natural Language: a Call
for a Community Effort
55
Natural
Language
Processing
Domain Modeling
assist
Natural Language Processing (NLP) for model
autocompletion
Loli Burgueño, Robert Clarisó Sébastien Gérard, Shuai Li, Jordi Cabot :
An NLP-Based Architecture for the Autocompletion of Partial Domain Models. CAiSE 2021: 91-106
Our approach
56
Partial
model
Text preprocessing
algorithm
preprocess
A.1
Domain corpus
of text
Domain
docs NLP method for
word embeddings
train
A.2
Morphological
analysis &
lemmatization
NLP models
contextual
knowledge
general
knowledge
NLP components
Our approach
57
Model Recommendation Engine
NLP method for
word embeddings
train
A.2
Morphological
analysis &
lemmatization
NLP models
contextual
knowledge
general
knowledge
Partial
model
Text preprocessing
algorithm
preprocess
A.1
Domain corpus
of text
Domain
docs
Our approach
58
Model Recommendation Engine
NLP method for
word embeddings
train
A.2
Morphological
analysis &
lemmatization
NLP models
contextual
knowledge
general
knowledge query
B.2
B.1
B.2
B.3
B.4
B.5
C.2
uses
uses
Partial
model
Text preprocessing
algorithm
preprocess
A.1
Domain corpus
of text
Domain
docs
Add class named Plane
Add class named Airline
Add class named Airplane
…
Our approach
59
Model Recommendation Engine
NLP method for
word embeddings
train
A.2
Morphological
analysis &
lemmatization
NLP models
contextual
knowledge
general
knowledge
update
C.1
query
B.2
B.1
B.2
B.3
B.4
C.2
uses
uses
Partial
model
Text preprocessing
algorithm
preprocess
A.1
Domain corpus
of text
Domain
docs
B.5
Add class named Plane
Add class named Airline
Add class named Airplane
…
‘Low-modeling’ strategies
Inferring models from your data
• There is a hidden model in your database, in
your API or web forms
• Or your CSVs
• Even in your docs, manuals and regulations
How far can we push AI technologies (e.g. pretrained language models) to make sense of
the implicit models behind all these data?
Bots for Open Data project
• Empowering citizens to benefit from open data sources
(>1.4M in the European data portal)
• Partial Bot models are generated from the CSV/JSON file
– E.g. checking the type of the columns we can generate obvious
questions (max, avg, min for numeric ones)
– Ontologies could be used to package more semantic libraries of
questions
– TextToSQL used as default fallback
Putting it all together
LOCOSS project
DevOps for Smart
software (e.g. keep track
of the training data)
++Open Challenges
Augme
nted
reality
Emotion
-aware
UI
Security
aspects
DSL for
prompts
Uncertai
nty in ML
Leverage
semantics
Testing
AI/non-AI
interact
Modeling
ethical
reqs
jordi.cabot@icrea.cat
@JordiCabot
jordicabot.com
Let’s keep the discussing
going!
Looking for PhDs and Postdocs in Barcelona

Smart modeling of smart software

  • 1.
    Smart Modeling ofsmart software @JordiCabot / jordicabot.com / modeling-languages.com Jordi Cabot
  • 2.
  • 3.
    SOM Research Lab Softwareruns the world. Models run the software
  • 4.
    Our mission We areinterested in the broad area of systems and software engineering, especially promoting the rigorous use of models and engineering principles while keeping an eye on the most unpredictable element in any project: the people. Flickr/clement127
  • 5.
  • 6.
  • 8.
    To model, ornot to model, this is the WRONG question - Shakespeare
  • 10.
  • 12.
    Hidden Technical debtin Machine Learning Systems - Google - https://proceedings.neurips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a 2463eba-Paper.pdf
  • 13.
    Smart / AI-enhanced/ ML-enabled software Trad Software AI component
  • 15.
    AI components aremostly software Input/Output Messaging channels External Platforms NLU Engine Events!
  • 16.
    How to developbetter smart software faster?
  • 17.
    • Grady Booch– history of softwre engineering The entire history of software engineering is that of the rise in levels of abstraction - Grady Booch
  • 18.
    To generate betterSmart software faster we should (Semi)automate the generation of Smart softwtare Better models for Smart software Better ways to create such models
  • 19.
  • 20.
    In most ofmy papers in the last 3 years
  • 21.
    Models A social artifactthat is acknowledged by an observer to represent an abstraction of some domain for a particular purpose. – E. Proper & G. Guizzardi Everything is a model – J. Bézivin
  • 22.
    • For instance,you can see them as model transformations that generate output data from their input • Many ML operations (e.g. fine tuning) can be mapped to “classical” model manipulation operations such as model refining ML Models are models If we agree on this, then it’s completely natural to manipulate all kinds of AI models and model manipulation operations using (extensions of) existing software modeling techniques
  • 23.
    All major vendorsare “going modeling”
  • 24.
    Keras can beconsidered a modeling framework for NN
  • 25.
  • 26.
  • 27.
    One app ->Multiple (smart) interfaces
  • 28.
    E Planas, GDaniel, M Brambilla, J Cabot: Towards a model-driven approach for multiexperience AI-based
  • 29.
  • 30.
  • 31.
  • 32.
    Xatkit Domain SpecificLanguage • Bots are created with the Xatkit DSL offering a bot-specific syntax for creating: – Intents the bot needs to match – The behavior to execute in response to the matched intents • The language is designed to easily integrate and combine all types of IUIs https://github.com/xatkit-bot-platform
  • 33.
    Our DSL isimplemented as a Java Fluent API • Create bots using your preferred Java editor – Benefit from all existing Java tooling when developing bots (e.g. debuggers) – Reuse any Java library for complex bot behaviors – Intuitive Fluent Interfaces to help you create advanced conversations – Based on state machine semantics to build any type of bot • (we did try first to implement our DSL as an external DSL but we were reinventing the wheel)
  • 34.
  • 35.
    Execution DSL example– State machines
  • 36.
    Xatkit is availableon GitHub https://github.com/xatkit-bot-platform
  • 37.
  • 38.
    BETTER (META)DATA FOR BETTERML -> DATA CAN BE A SOURCE OF BIASES A dataset with limited represenativeness of skin colors could fail to recognize people E.g.1 Facial analysis NLP: Different accents A dataset limited to Australian voice records may fail when trying to understand British speakers
  • 39.
    Annotating datasets • Someongoing concerns and proposals from the AI community to clarify the representativeness of the data, its provenance, possible social issues,… – E.g. Datasheets for datasets – Gebru et al, CACM • But there are currently no industry standards for documenting ML datasets
  • 40.
    DescribeML • A DSLto describe ML datasets • Collecing and “unifying” different templates and guidelines from the ML community A domain-specific language for describing machine learning datasets, arXiv:2207.02848 Joan Giner-Miguelez, Abel Gómez, Jordi Cabot https://github.com/SOM-Research/DescribeML
  • 41.
  • 42.
    DescribeML Tool • Plug-in forVSCode • Implemented with Langium
  • 43.
    +APPLICATIONS Searching and comparingdatasets based on your concrete needs Replication of ML experiments when the original data is not available
  • 44.
    Modeling development processes forsmart software projects
  • 45.
    Motivation Development of modernapplications –which include AI components as core logic– requires multidisciplinary teams with diverse skill sets: software engineers, data scientists, psychologists, AI experts… Diversity may lead to communication issues or misapplication of best practices. There is a need for more support and guidance when developing AI-based software.
  • 46.
    Process modeling A processmodel provides full visibility and traceability about: ● the work decomposition within an organization ● the responsibilities of their participants ● the standards and knowledge it is based on Process models are guidelines for configuration, execution and continuous improvement.
  • 47.
    Proposal Our DSL: ● doesnot prescribe any concrete AI engineering process model ● offers the modeling constructs to easily define own processes Expected benefits: ● Design, enactment, automation and monitoring of AI processes ● Detect hidden or conflicting practices ● Simplify the onboarding of new team members We propose a domain-specific language (DSL) to facilitate the modeling of AI engineering processes, based on the analysis of research and gray literature.
  • 48.
  • 49.
  • 50.
  • 51.
    Modeling Copilot -Easing modeling
  • 52.
  • 53.
  • 54.
    But we don’thave the data  José Antonio Hernández López, Javier Luis Cánovas Izquierdo, Jesús Sánchez Cuadrado: ModelSet: a dataset for machine learning in model- driven engineering. Softw. Syst. Model. 21(3): 967-986 (2022) Jordi Cabot, David Delgado and Lola Burgueño. Combining OCL and Natural Language: a Call for a Community Effort
  • 55.
    55 Natural Language Processing Domain Modeling assist Natural LanguageProcessing (NLP) for model autocompletion Loli Burgueño, Robert Clarisó Sébastien Gérard, Shuai Li, Jordi Cabot : An NLP-Based Architecture for the Autocompletion of Partial Domain Models. CAiSE 2021: 91-106
  • 56.
    Our approach 56 Partial model Text preprocessing algorithm preprocess A.1 Domaincorpus of text Domain docs NLP method for word embeddings train A.2 Morphological analysis & lemmatization NLP models contextual knowledge general knowledge NLP components
  • 57.
    Our approach 57 Model RecommendationEngine NLP method for word embeddings train A.2 Morphological analysis & lemmatization NLP models contextual knowledge general knowledge Partial model Text preprocessing algorithm preprocess A.1 Domain corpus of text Domain docs
  • 58.
    Our approach 58 Model RecommendationEngine NLP method for word embeddings train A.2 Morphological analysis & lemmatization NLP models contextual knowledge general knowledge query B.2 B.1 B.2 B.3 B.4 B.5 C.2 uses uses Partial model Text preprocessing algorithm preprocess A.1 Domain corpus of text Domain docs Add class named Plane Add class named Airline Add class named Airplane …
  • 59.
    Our approach 59 Model RecommendationEngine NLP method for word embeddings train A.2 Morphological analysis & lemmatization NLP models contextual knowledge general knowledge update C.1 query B.2 B.1 B.2 B.3 B.4 C.2 uses uses Partial model Text preprocessing algorithm preprocess A.1 Domain corpus of text Domain docs B.5 Add class named Plane Add class named Airline Add class named Airplane …
  • 60.
  • 61.
    Inferring models fromyour data • There is a hidden model in your database, in your API or web forms • Or your CSVs • Even in your docs, manuals and regulations How far can we push AI technologies (e.g. pretrained language models) to make sense of the implicit models behind all these data?
  • 62.
    Bots for OpenData project • Empowering citizens to benefit from open data sources (>1.4M in the European data portal) • Partial Bot models are generated from the CSV/JSON file – E.g. checking the type of the columns we can generate obvious questions (max, avg, min for numeric ones) – Ontologies could be used to package more semantic libraries of questions – TextToSQL used as default fallback
  • 64.
  • 65.
    LOCOSS project DevOps forSmart software (e.g. keep track of the training data)
  • 66.
  • 67.
  • 68.
  • 69.
    jordi.cabot@icrea.cat @JordiCabot jordicabot.com Let’s keep thediscussing going! Looking for PhDs and Postdocs in Barcelona

Editor's Notes

  • #5 Similar to many of you
  • #6 Before we talk about the relationship between modeling and AI, we should make sure we have a common understanding of these concepts
  • #8 I’ll skip the introduction to MDE since you’re all experts on this here. I’ll just mention two points
  • #9 There is always a model and we all model, at least in our head, the real question is whether to make the models explícit And this is linked to a trade-off analysis of the cost of these models vs the benefits of having them explícit. As we’ll see, AI can reduce the cost and maximize these benefits
  • #10 Everything I say applies also to lowcode.
  • #13 No es una contradicción con la anterior sinó simpelemente recalcar que este software inteligente sigue siendo software, algo que como veremos es también muy cierto en el caso de los chatbots
  • #14 Smart software is the union of traditional software components and AI components Some traditional software components do not interact with AI ones. Same for the aI ones. But others do interact.
  • #15 Simple exemple an ecommerce Store Traditional software components -> Database to Store the catalogue of products and the orders Front-end AI components -> chatbot Back-end AI components -> Recommender System for product suggestions
  • #16 Let me insist on the point that even for the AI components, most of its core logic is not really related to AI. So we are always the most prepared to build such components!
  • #17 So let’s go back to the original question. What is your answer???
  • #18 Our solution is not that “original”
  • #23 We have a marketing problema!!!
  • #24 There are quite a few tools that help you model your neural network and ML process
  • #25 1 Embedding layer (to move from words to arrays of numbers) 2 hidden layers 1 final output layer
  • #26 It’s not radically new, it’s an adaptation / evolution of what we know how to do Note that this is also useful for pure AI Researchers that are taking notes. “AI researcher taking notes” according to Dali Let’s see now three examples of modeling extensions for Smart software
  • #27 Let’s now see the first example of a modeling extension to cover smart software. Note that as AI is mostly software, there are many facets of smart software that do not need extensions at all
  • #28 Concept of multi-experience interaction
  • #29 Any modality is tranformed into a text representation
  • #30 Intents are the types of requests the interface must understand
  • #33 These concepts are implemented in Xatkit
  • #38 Not modeling the data -> modeling the dataset
  • #43 Langium is a new language workbench with “modern” technologies (Typescript and nodejs)
  • #50 We’re not modeling the data pipeline. Or not just that. We are modeling who is in charge of what, which is a different perspective.
  • #54 Beyond an autocomplete similar to the Mendix Assist, they also offer another interesting type of suggestions for data models.
  • #55 Even less for any language different from UML We’re try to reuse SQL datasets to get some OCL training data
  • #56 Instead of copilot that it’s pretrained on large corpus of code samples, we for now went for a more pragmatic solution that instead of automatically autocompleting parts of the model suggests new concepts to be added to it, using both general and contextual knoweldge
  • #59 The slice 1 was aimed at obtaining new classes. We use the POS tag to filter the list by discarding verbs, adjectives and plural nouns.
  • #60 The slice 1 was aimed at obtaining new classes. We use the POS tag to filter the list by discarding verbs, adjectives and plural nouns.
  • #63 Or “before” “after” “during” for dates If wek now salay is a an integer we can generate a number of questions . If we know Salary is a concept we can ask others
  • #66 Code generation implies triggering the training of a ML model Versioning of the system may require keep track of the training data (and the hyperparameters) so that if the system behaves worse we can analyse if it’s due to changes on the training data Continuous monitoringis amust
  • #67 Beyond all those that we have already mentioned, there are plenty of other challenges
  • #68 e.g. role-based Access control for Smart front-ends, prompt injection attacks
  • #70 If you want to keep exploring and Building these tòpics together ...