Le pilotage des procédés de production est une discipline délicate qui engage non seulement la qualité de la production mais également son efficience. Dans de nombreux procédés, beaucoup de facteurs interagissent pour atteindre l'objectif et il n'est pas facile de comprendre leurs interactions lorsqu'on regarde les paramètres un à un. Cette conférence propose une approche originale pour prendre en compte le caractère multivarié de la question ...
7. CORRELATED DATA
2 D E x a m p l e
7
Classical control chart:
→ Analyse individual parameter at a time
→ Define a large monitoring area
→ Take time before identifying an anomaly
Underperforming
8. CORRELATED DATA
2 D E x a m p l e
8
Classical control chart:
→ Analyse individual parameter at a time
→ Define a small monitoring area
→ Too many false alarms
Not appropriate
To exceed the required quality
9. CORRELATED DATA
2 D E x a m p l e
9
Multivariate control chart:
→ Analyse all the parameters at once
→ Define an appropriate monitoring area
Appropriate
10. MULTIVARIATE SPC
M a h a l a n o b i s D i s t a n c e
10
Source: Woźniak et al, 2019
Based on representative dataset:
→ Distance between a point and a
distribution
→ Points 1 and 2 have the same distance
Gives a multidimensional distance
Alarm if the distance is too large
Same
distance
11. MULTIVARIATE SPC
P r i n c i p a l C o m p o n e n t A n a l y s i s
11
Source: Woźniak et al, 2019
Based on representative dataset :
→ Expression of the data along the axis
with the most variations (Principal
Components)
→ Reduction of the dimensions
Gives the axis with the most variations
Identify the axis of the shift
16. CONCLUSION
M u l t i v a r i a t e S P C
16
Advantages of the multivariate control charts:
+ Can be applied to any complex fields
+ Take into account all the characteristics of
the measurement
+ Control charts representative of the reality
17. FUTURE WORK
I n c e r t a i n t i e s
17
• Include the uncertainty in the multivariate
calculations
• Bayesian Measurement Refinement
→ Based on conditional probabilities
→ JCGM 106:2012 (or ISO GUIDE 98-4)
18. 18
18
References
• Woźniak, M., Gałązka-Friedman, J., Duda, P., Jakubowska, M., Rzepecka, P. and Karwowski, Ł. (2019)
Application of Mössbauer spectroscopy, multidimensional discriminant analysis, and Mahalanobis distance for classification of
equilibrated ordinary chondrites
Meteorit Planet Sci, 54: 1828-1839. https://doi.org/10.1111/maps.13314
• JCGM 106:2012
Evaluation of measurement data – The role of measurement uncertainty in conformity assessment
• Gilbert Saporta (2011)
Probabilités, analyse des données et Statistique
Hello everyone, thank you very much to listening to my presentation
My presentation is about analysing multivariate data to monitor a production.
Before going further ,I would like to give you a bit of context.
We all know that the 21st century is highly influenced by data. Multibillionaire companies such as Google, or Amazon have used data to analyse our needs and create new ones. Not only GAFAs have been using data, insurance companies also use large atmospheric and oceanographic data, to evaluate the risk of flood and storm to calculate their clients’ subscriptions.
We see that analysing data is crucial to understand any behaviour, or phenomenon.
Industrial companies start to acknowledge this new resource, and for an industrial company, analysing data is useful either to characterise an instrument, monitor a production and predict a shift if this happens.
So here at Deltamu we have been working with geotextile manufacturer to analyse their production.
Geotextile can be of different types, it can be woven or not, even knitted. It is used for different purposes, either for separation, filtration or drainage. And we use it in various fields such as roadworks, agriculture and so on.
To characterise a geotextile, we need a set of measurements which are more or less correlated to one another,
In this picture we see the set of measurement (9 in total), what is done usually is that we monitor these data independantly.
Here you have an example of a monitoring of the Tensile strengh, using a control chart based on the average and the standard deviation.
Unfortunately, this approach does not consider the measurements as a complex set of correlated variables.
In this presentation, we will tackle this issue and present different mathematical tools which can be used. These tools are not new and are heavily used in other fields (psychology, finance, computing, …), but there are not well developed in industries.
For those who are not familiar with correlated data, here you have an example of two correlated measurements: the thickness of the geotextile and the NF punching. This experiment tests the robustness of the material during its use. Its makes sense that the thicker the geotextile is, the more robust the material will be. This information, this correlation between these two data is captured in what we call a cavariance matrix.
To make it simple, a covariance matrix is just a table showing how strongly two measurements are linked one another. If two data are independant, the correlation will be 0. If the correlation is close to 1 or -1, the two data will be highly correlated.
Pyramidal punch
NF G 38 019 : Détermination de la résistance au poinçonnement
Objectif :
Appréhender les efforts subis par le géotextile lors de sa mise en œuvre, ou en service.
Méthodologie :
Détermination de la force nécessaire pour assurer la traversée d'une éprouvette de géotextiles par un poinçon pyramidal, perpendiculairement au plan défini par le produit. La méthodologie est la même que pour la norme "NF EN ISO 12236 : Éssai de poinçonnement statique CBR". L'unité de mesure est le kN.
Now lets see what happen when we use basic control chart. What follows is just an example in 2D to understand the current problematic, but the real benefit is more than two variables.
Presentation of the plot, showing the correlation.
By monitoring these two variables independantly, we define a lower and upper limit for both variables, resulting to the area in blue, which is far more too large compared to the real data. It is unlikely we will have data in the top left corner due to the correlation, and the actual control chart wont be able to seize this anomaly.
On the opposite, we are tempted to reduce this area to be able to seize any anomaly in our production. This is once again not appropriate as this will give us a large number of false alarms and we wont be able to distinguish between anomaly and real data.
We also over perform as many measurements will be incorrectly stated as non compliant
A way to tackle this issue is to consider the covariance of the data. By studying the density of the data, we can identify an area where the data are well represented, as we can see with this ellipse on the graph. And any data occuring outside this ellipse will be considered as an anomaly.
We have two different tools to characterise this anomaly.
First we use the distance of Mahalanobis to alert when a measurement is out of the ellipse.
Second we use the PCA to identify which variable is affected by this anomaly.
The Distance of Mahalanobis is like just any other distance except that it takes into account the correlation between the data.
For exemple, points 1 and 2 have the same distance because they belong to the same density line, which is not the case of Point 3.
If the data were independant, we will have a circle and not an ellipse and the Mahalanobis distance will be the normal Euclidian distance.
As I said earlier, we use the PCA to give us information on which variables is affected by the anomaly.
In few words, a PCA is a way to identify the axis with the most variation along the axis. Once the axis are identified we translate the data into this new framework.
Just as an exemple and to present this mathematical approach, we have analysed the two parameters introduced earlier and we have simulated a wear for the pyramidal punch giving us these data.
Description of the plot
So now we were alerted by an anomaly, we would like to know which parameters are affected by it. The PCA is able to identify the axis where the anomaly takes place.
In 2D, this analysis is not relevant
As I said earlier, this 2D exemple was just an example to have a better representation of the approach. What is more relevant for us is to work with the entire set of variables, in our case the 9 variables I showed you previously.
The distance of Mahalanobis is more difficult to visualise in 9 dimensions. The best way is to have 2 types of figures:
One showing the evolution of the distance
One showing the histogram, the distribution of the distance of Mahalanobis, which is a Chi square with 9 degrees of freedom.
At that stage, we are informed that there is an anomaly on the way. Now what we want to know is the direction of this anomaly, what are the variables affected by this anomaly.
Once again the representation is more difficult in 9 dimensions. What we do is that we look at the principal components which are the major axes of the variation. Here we can see 9 principal components and lets say that 3 or 4 major axes represent 75% of the total variation.
By looking at the projection of the anomaly on the principal components we can identify the variables