MoLe aims to facilitate the development and use of digital twins for smart factories, so factory stakeholders can enjoy all the benefits of digital twin technologies with little effort.
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
Mo.Le nec-midih_presentation_oc2
1. Speaker: Mauricio Fadel Argerich
NEC Laboratories Europe GmbH
Model Learning for Cloud-Edge
Digital Twin
2. MoLe: Model Learning for Cloud-Edge Digital Twin
• With MoLe we aim to simplify the
implementation and execution of Digital Twins
in digital factories by:
• Utilizing FIWARE technologies (NGSI-LD, Scorpio,
FogFlow) to dynamically orchestrate a setup to
generate a Digital Twin from data and execute it
on Edge and Cloud
• Reducing effort needed to develop prediction and
simulation models by using Knowledge Infusion
3. I4.0Lab – Manufacturing & Assembly process
• We have developed our solution for the MIDIH
Didactic Factory in Milan
• The factory implements the Manufacturing and
Assembly process of PCBs
• 7 steps, carried out by different stations
4. Translator and Scorpio
• Stations in didactic factory use NGSIv2, Scorpio
uses NGSI-LD
• Translator reads sensor data from the Factory
Information Bus and uses Kafka information (topic,
key, message schema) to transform it into NGSI-LD
entities
• Data in NGSI-LD is received by the Scorpio Broker,
where it is stored and sent to any subscribers
• Scorpio optimizations:
• We optimized data handling and serialization, as well as
error handling
• We optimized Scorpio’s vertical scaling and internal data
handling
Translator
DB
Subscriber
Subscriber
Subscriber
Subscriber
Scorpio
5. Results: Translator and Scorpio
Translator
• Worst case delay of 1ms between a message arrival
in the Kafka bus and the retrieval in the Python
library
• Below KPI of 2Hz (sensors sampling frequency)
Scorpio
• We made a comparison with the updated version of
the FIWARE GE Orion called Orion-LD
• Scorpio achieved in average half the latency of Orion
Scorpio
Orion
6. Digital Twin Models
• We implemented Digital Twins (DTs) for the
different stations in the M&A process
• DTs are programmable objects that can be
instantiated for real-time monitoring and
simulations
• DTs of Front Cover Magazine and the Press Station
• These DTs implement specific models to:
• detect the current status of the station based on its
current sensor and actuators data
• predict energy usage based on the same data
7. Knowledge Infusion
• Energy usage prediction à pure ML
• State inference model à Knowledge Infusion (KI) = ML + domain knowledge
• Domain knowledge is infused through Knowledge Functions (KFs)
• KFs output a single value: a label.
• Functions implement human provided logic and utilize facts derived from internal and
external knowledge bases
• Types of knowledge functions: Weak and Strong
• KI creates a Knowledge Model that serves two purposes:
• Data augmentation: improves data quality by creating new features or labelling data
• Robustness: allows us to correct some obviously wrong outputs of the ML model during runtime
ML
Model
Knowledge
Model
KI Model
8. FogFlow
• FogFlow was extended to serve ML models as serverless fog functions
• ML models are implemented based on FogFlow ML operators
• To implement a FogFlow ML model, we follow 3 steps:
1. Model registration: to register a ML through a web-based GUI
2. Model deployment: to create and deploy a serverless fog function in FogFlow to run the ML model
3. Model serving: to apply the ML model inside the deployed function instances to produce the
detection/prediction result of the input data and then update the state of the corresponding DT
9. Results: Digital Twin Models
• Model for the station’s state
• Small high quality dataset: manually labeled 10m of
data (2Hz, 1200 data points total) as “Idle” or
“Working”
• Larger noisy dataset: around 50m or 6000 data points,
labelled with 2 simple functions
• We used the small dataset to train a Random Forest
Classifier (RFC) using all the features (45)
• First 600 samples to train RFC and last 600 to evaluate it
• The RFC achieved an accuracy of 82.11% on its test set
• We also implemented a Knowledge Model (KM),
based on two programmable functions
• Each function took no more than 5 minutes, they check
variable values and return state of the machine
• Test accuracy is 70%
Performance of RFC on test set when trained
with 50% of manually labeled data
Performance of KM on test set with 2 simple
labelling functions
S
L
train
test
à
10. Results: KI for Digital Twin Models
• We can utilize KM to label larger unlabeled dataset and
train a ML model with it
• Hopefully, the ML will learn to filter out the noise
• The ML model is trained with 10x more data as
before
• We re-trained the RFC with these larger noisy dataset
• Slight improvement in its performance: 82.33%
• Without any manually and costly labeled data!
• KM took us about 10 minutes to implement
• We have also implemented a KI model that utilizes a
supervision function
• This function verifies the value of certain variables
and forces the output of the ML model
• This function represents a layer of safety
• Accuracy in our tests remained the same
11. Results: Energy Consumption Prediction
• We implemented a Random Forest
Regressor to deal with correlated and non-
informative variables
• Energy consumption is influenced by activity of
the station, logged by its sensors
• There are variables which are correlated and
others are nearly constant throughout the
activity, it seems we might have partial visibility
• We trained it using 80% of the full time
series data for Press Station and kept the
20% remaining as test set
• The Random Forest Regressor obtained a Mean
Absolute Error of 2.75. train
test
à
Results on test set
12. KPIs
KPI 1: Data velocity
1000 msgs per second >> 2 msgs per second (sampling frequency of
sensors in digital factory)
KPI 2: Generation of knowledge graph for the DT
The Translator is capable of generating a graph structure in the form
of NGSI-LD Entities. This is done automatically using provided meta
data from the factory message bus.
KPI 3: Accuracy of KI for DT
Accuracy of ML model trained with hand labeled data: 82.11%
Accuracy of Knowledge Model: 70.50%
Accuracy of ML model trained with data labeled by Knowledge
Model: 82.33%
KPI 4: Accuracy of DT Refinement (Strong KF)
Accuracy of KI model with strong KF: 82.33%
Note: Same accuracy as without refinement because Strong KF did
not find any necessary correction
13. Lessons learnt
NGSI Translator: Valuable tool for extracting knowledge from raw data, allows for more flexibility
(NGSIv2, NGSI-LD). You can find the translator at
https://github.com/ScorpioBroker/ScorpioBroker/tree/feature-82/NGSILDTools/NGSILDTranslator
KAFKA NGSI-LD Integration: Kafka is a good choice as it is wide spread and has excellent performance.
Recommendation: use key on Kafka messages so an identifier is attached to the data.
Digital Factories Data: Data heterogeneity between factories is still very high.
Opportunity: tools/techniques to join data from different factories are valuable and needed!
Recommendation: it’s beneficial to publish example data from MIDIH factories.
Knowledge Infusion: it enabled us to train a classifier with no manually labeled data, achieving high
accuracy. KI shows great potential to reduce effort of creating ML models.
KI, ML models’ performance and execution in FogFlow: Models achieved good accuracy but we
believe this can be further improved.