Successfully reported this slideshow.
Bridging the gap between data and knowledge 
Bridging the gap between data and knowledge
            with The Unscrambler ...
2
                        Content


1. Improve your work time efficiency
2. Combine data from many sources for enhanced
  ...
Improve your work time efficiency
Improve your work time efficiency




                          www.camo.com
4
       Organized and annotated projects
                and audit trail




Project Navigator
                         K...
5
Preview the results of your pretreatment




                   Save time in optimizing the 
                   Save tim...
6
                     Conclusion


• Organized data save you a lot of time!
      What did I/my colleague do last month w...
Combine data from many sources for enhanced 
Combine data from many sources for enhanced
     understanding of complex sys...
8
Import data for various sources



                Unscrambler matrices
                U      bl     ti
               ...
9
Our Instrument Partners




                      www.camo.com
10
            System Integration Partners


• Integration for online monitoring and control:
   –   Siemens SiPAT
   –   ...
11
OPC import menu




                  www.camo.com
12
Imported data




                www.camo.com
13
         Combine them in the analysis

• X and Y matrices can be in separated datasets
                               p...
14
                     Conclusion


• See relationships and create models between
  any kind of data:
    y
  – Different...
Understand the structure of your data and locate the 
                            y
     root cause of process/product dev...
Fundamentals of Multivariate Statistical Process Control


                                      • Th Ellipse i k
        ...
17
     Design Space: As defined by ICH Q8

The multidimensional combination and interaction of input 
                   ...
18
     NIR Spectroscopy for monitoring the
            granulation process

• Acquire NIR spectra during the process
• Go...
19
            High Shear Wet Granulation


• Granulation process is important to:
  •   increase particle size
  •   enha...
20
           Granulation batches studied


• Diffuse reflection NIR spectra collected at 2-3 second
  intervals for 15 ba...
21
First derivative NIR spectra of HSG process

 Color coded to highlight the stages of the process:
 Mixing of lactose & ...
22
PCA analysis: line plot of PC score 1
 Batches 4 & 5 differ: no PVP was added during the liquid 
 addition phase
  ddit...
23
PCA score plots of 3 batches run under
          target conditions
                                           Granulati...
24
Granulation trajectory from 3-D Scores plot



             Granulation ‐ end




Dry mix
Dry mix



                  ...
25
                  Conclusion


• The structure of a data set is revealed by PCA.
• Note: sometime you need pre-treatmen...
Design more efficient processes and products
Design more efficient processes and products




                            ...
27
                  Principle of DoE

  • Perform the least number of experiments to
    cover the design space in an eff...
28
        Why do we use DoE compared to the
              “scientific approach”?
               scientific approach ?
• O...
29
                          The logical approach

         Set the goal of the experimentation (model type)
         Sele...
30
Start tab




            www.camo.com
31
                     Define variables tab




All the variables are defined in the same table.
Easy definition thanks t...
32
Choose the design tab



      Auto‐selection of the best suiting design

      Designs stated as actions


      Infor...
33
                             Design details




Select the resolution of the design depending on your goal and the numb...
34
Additional experiments




                         www.camo.com
35
Randomization




                www.camo.com
36
                              Summary




The calculation of the power for the two 
response variables shows that to de...
37
Tables in X




              www.camo.com
38
Analysis




           www.camo.com
39
Results: Effect summary




                      www.camo.com
40
       Results: Diagnostics




Probable curvature effect


                              www.camo.com
41
Results: Residuals




            Or maybe a bias at 
            the end of 
            experimentation.




       ...
42
Extension of the design




                      www.camo.com
43
Extension of the design




                      www.camo.com
44
Results: Response surface




                       www.camo.com
45
                   Conclusion


• DoE helps you to:
  – Create
  – Improve
a process or product
             product.

...
Predict quality at an early stage and 
Predict quality at an early stage and
classify raw material/batch attributes




  ...
47
              Visualizing groups


• PCA score plot
• Clustering




    Make a model to predict the group: 
    Make a...
48
                    SIMCA Classification

  • Soft Independent modeling of Class Analogies:
             p             ...
49
                  SIMCA Classification

• Soft Independent modeling of Class Analogies:
           p              g    ...
50
                  Example dataset




NIR data of:
• 83 samples: 67 calibration and 16 test
• 2600 variables
• 5 groups...
51
Overview PCA scores plot of training
      samples from 4 classes




                            www.camo.com
52
                  Classification


• PCA model on independent classes




                                     www.camo...
53
Classification of the new samples



                        All the foreign samples are 
                        All t...
54
The MCC sample is detected as outlier as its
        leverage is too important




                                 www...
55
                   PLS Discriminant Analysis

 • Each class is represented by a 0 / 1 variable:
       – Build a regres...
56
     Example data set




                        Spectra
                         p


Category variables: 
2 values: 0...
57
Good models for all groups




                        www.camo.com
58
Prediction




             www.camo.com
Prediction on the AciDiSol model


   A lot of uncertainty on the foreign samples.




                                   ...
60
Prediction on the MCC model


 A lot of uncertainty on the foreign samples.


                                         ...
61
Inlier vs Hotelling T2




  MCC20 is an inlier




                         www.camo.com
62
                   Conclusions


• MVA can be used for classification /
  characterization as well as quantification
  ...
Conclusions




              www.camo.com
64
                         Objectives and Tools



             Objective
               j                               ...
65
                General Conclusions


• Multivariate analysis:
   – gives y a g
     g      you global
     picture.
  ...
66
                         Benefits


• Multivariate analysis in The Unscrambler X benefits:
  – Team work (project archi...
67
      Archived webinars
www.camo.com/training/webinars‐seminar.html




                                          www.c...
68
                       Global Presence
                                      Head office :
                            ...
69
        Questions




Marion C n marion@camo no
       Cuny: marion@camo.no




                        www.camo.com
Upcoming SlideShare
Loading in …5
×

Bridging The Gap Between Data Knowledge

861 views

Published on

Discover how data mining can benefit you & explore the root cause of process / product deviations using Design of Experiments

  • Be the first to comment

Bridging The Gap Between Data Knowledge

  1. 1. Bridging the gap between data and knowledge  Bridging the gap between data and knowledge with The Unscrambler X Discover how data mining can benefit you. Discover how data mining can benefit you. Marion Cuny CAMO Software AS CAMO Software AS www.camo.com
  2. 2. 2 Content 1. Improve your work time efficiency 2. Combine data from many sources for enhanced understanding of complex systems 3. Understand the structure of your data and locate the root cause of process/product deviations 4. Design more efficient processes and products 5. Predict quality at an early stage and classify raw material/batch attributes 6. Conclusions 6 C l i www.camo.com
  3. 3. Improve your work time efficiency Improve your work time efficiency www.camo.com
  4. 4. 4 Organized and annotated projects and audit trail Project Navigator Know the project progression by  looking at the: looking at the: • Project organization,  • Audit trail and  • Information and notes displayed for Information and notes displayed for  Info and Notes Boxes each object. www.camo.com
  5. 5. 5 Preview the results of your pretreatment Save time in optimizing the  Save time in optimizing the parameters of your pretreatments  before performing them. before performing them. www.camo.com
  6. 6. 6 Conclusion • Organized data save you a lot of time! What did I/my colleague do last month with this dataset? What was the plot that was showing the results? • Preview of results: don’t do things that don’t give don t don t good results. www.camo.com
  7. 7. Combine data from many sources for enhanced  Combine data from many sources for enhanced understanding of complex systems www.camo.com
  8. 8. 8 Import data for various sources Unscrambler matrices U bl ti ASCII Text Excel. Also possible to use copy‐paste  and drag and drop Matlab Spectral formats Database (Oracle, SQL,..) D b (O l SQL ) www.camo.com
  9. 9. 9 Our Instrument Partners www.camo.com
  10. 10. 10 System Integration Partners • Integration for online monitoring and control: – Siemens SiPAT – Optimal SynTQ – Symbion y – ABB XPAT & FTSW integration – GE Fanuc GE Fanuc www.camo.com
  11. 11. 11 OPC import menu www.camo.com
  12. 12. 12 Imported data www.camo.com
  13. 13. 13 Combine them in the analysis • X and Y matrices can be in separated datasets p • Aggregate matrices www.camo.com
  14. 14. 14 Conclusion • See relationships and create models between any kind of data: y – Different type – Different stages of the p g process and get a clear understanding of what is going on. www.camo.com
  15. 15. Understand the structure of your data and locate the  y root cause of process/product deviations www.camo.com
  16. 16. Fundamentals of Multivariate Statistical Process Control • Th Ellipse i k The Elli is known as Hotellings T2 Ellipse and represents a 95% confidence region. • There are regions in the multivariate Variable 2 control chart that are forbidden in the i i t th univariate charts. • There are also regions in the univariate sense that are out of Variable V i bl 1 control in a multivariate sense www.camo.com
  17. 17. 17 Design Space: As defined by ICH Q8 The multidimensional combination and interaction of input  p variables and process parameters that have been demonstrated to  provide assurance of quality Design Space Desired State Undesired State www.camo.com
  18. 18. 18 NIR Spectroscopy for monitoring the granulation process • Acquire NIR spectra during the process • Goal: Understand batch behavior, and follow process trajectories with PCA High Shear Granulator (Glatt  g S ea a ua o ( a TMG) with diffuse reflectance  probe and NIR spectrometer  collecting spectra at 2 second  collecting spectra at 2 second interval www.camo.com
  19. 19. 19 High Shear Wet Granulation • Granulation process is important to: • increase particle size • enhance compressibility • improve hydrophilicity • improve product h i d t homogeneity it • The process has three stages: • Dry mix phase - lactose & starch ( minutes) (2 ) • Liquid addition phase – PVP and water (1-2 minutes) • Granulation (3-5 minutes) www.camo.com
  20. 20. 20 Granulation batches studied • Diffuse reflection NIR spectra collected at 2-3 second intervals for 15 batches, giving 130-180 spectra per batch • Each spectrum 1100-2200 nm (1101 variables) • First three batches run at target conditions – Some process changes in terms of addition rates, impeller speeds, granulation time in other batches • PCA model to find patterns and groupings, and model the granulation process www.camo.com
  21. 21. 21 First derivative NIR spectra of HSG process Color coded to highlight the stages of the process: Mixing of lactose & starch Liquid Addition – water & PVP Granulation OH peaks increase on addition Change in CH bands due to binders www.camo.com
  22. 22. 22 PCA analysis: line plot of PC score 1 Batches 4 & 5 differ: no PVP was added during the liquid  addition phase dditi h Batch 6: target conditions with longer granulation time www.camo.com
  23. 23. 23 PCA score plots of 3 batches run under target conditions Granulation – end point Dry mixing phase Liquid addition phase www.camo.com
  24. 24. 24 Granulation trajectory from 3-D Scores plot Granulation ‐ end Dry mix Dry mix Liquid addition www.camo.com
  25. 25. 25 Conclusion • The structure of a data set is revealed by PCA. • Note: sometime you need pre-treatment to reveal pre treatment the structure accurately. www.camo.com
  26. 26. Design more efficient processes and products Design more efficient processes and products www.camo.com
  27. 27. 27 Principle of DoE • Perform the least number of experiments to cover the design space in an efficient way. X2 X2 max max min min min max min max X1 X1 www.camo.com
  28. 28. 28 Why do we use DoE compared to the “scientific approach”? scientific approach ? • One variable at a time approach: pp In order to establish a relationship between cause and effect, each cause must be investigated separately, all other conditions being fixed. • The limit of the one variable at a time approach: X2 X2 Actual optimum X1 X1 www.camo.com
  29. 29. 29 The logical approach Set the goal of the experimentation (model type) Select the variables to include in the design Select the response variables Select the appropriate design X Y Ex: Maximize the Ex: Cooking time, Ex: Stability BBD, Ex: CCD quality of our cookies: temperature, chocolate preference, cost Quadratic model content www.camo.com
  30. 30. 30 Start tab www.camo.com
  31. 31. 31 Define variables tab All the variables are defined in the same table. Easy definition thanks to the tick box menu and radio buttons. Easy definition thanks to the tick box menu and radio buttons www.camo.com
  32. 32. 32 Choose the design tab Auto‐selection of the best suiting design Designs stated as actions Information on the selected design www.camo.com
  33. 33. 33 Design details Select the resolution of the design depending on your goal and the number of  experiment to run. www.camo.com
  34. 34. 34 Additional experiments www.camo.com
  35. 35. 35 Randomization www.camo.com
  36. 36. 36 Summary The calculation of the power for the two  response variables shows that to detect a  difference of 0.6 for the preference this  design is not appropriate as the power is  d h below 0.8. We can look for the LSD that can be found. W l k f th LSD th t b f d www.camo.com
  37. 37. 37 Tables in X www.camo.com
  38. 38. 38 Analysis www.camo.com
  39. 39. 39 Results: Effect summary www.camo.com
  40. 40. 40 Results: Diagnostics Probable curvature effect www.camo.com
  41. 41. 41 Results: Residuals Or maybe a bias at  the end of  experimentation. www.camo.com
  42. 42. 42 Extension of the design www.camo.com
  43. 43. 43 Extension of the design www.camo.com
  44. 44. 44 Results: Response surface www.camo.com
  45. 45. 45 Conclusion • DoE helps you to: – Create – Improve a process or product product. www.camo.com
  46. 46. Predict quality at an early stage and  Predict quality at an early stage and classify raw material/batch attributes www.camo.com
  47. 47. 47 Visualizing groups • PCA score plot • Clustering Make a model to predict the group:  Make a model to predict the group SIMCA, PLSDA, SVM and LDA www.camo.com
  48. 48. 48 SIMCA Classification • Soft Independent modeling of Class Analogies: p g g – Make a PCA model for each class; – Project new samples onto the model. j p Maximum  Center  Center distance to the  distance to the of  model (Si) PC2 model Samples from  Maximum  g p group A PC1 group A g p leverage for the  leverage for the Samples from  model (Hi) group B PC1 group B PC1 Samples from  group C PC1 group C www.camo.com
  49. 49. 49 SIMCA Classification • Soft Independent modeling of Class Analogies: p g g – Make a PCA model for each class; – Project new samples onto the model. j p PC2 Samples from  group A group A PC1 group A PC1 group A Samples from  group B PC1 group B PC1 Samples from  group C PC1 group C www.camo.com
  50. 50. 50 Example dataset NIR data of: • 83 samples: 67 calibration and 16 test • 2600 variables • 5 groups but only 4 for creating the models www.camo.com
  51. 51. 51 Overview PCA scores plot of training samples from 4 classes www.camo.com
  52. 52. 52 Classification • PCA model on independent classes www.camo.com
  53. 53. 53 Classification of the new samples All the foreign samples are  All th f i l rejected by all models. MCC samples not  recognized by its model. recognized by its model www.camo.com
  54. 54. 54 The MCC sample is detected as outlier as its leverage is too important www.camo.com
  55. 55. 55 PLS Discriminant Analysis • Each class is represented by a 0 / 1 variable: – Build a regression model with those variables as responses ( p (PLS1 for 1 or 2 classes, else PLS2); , ); – Make predictions for new samples: close to 1 means “member”, close to 0 “non member”. A B C Samples from  1 0 0  Predicted Predicted Predicted group A 1 0 0 1 1 1 Samples from  0 1 0 group B 0 1 0 0 1 0 0 0 0 Samples from  0 0 1 group C 0 0 1 0                1 Measured 0                1 Measured 0                1 Measured 0 0 1 Model B Model A  Model C Classification www.camo.com
  56. 56. 56 Example data set Spectra p Category variables:  2 values: 0 & 1 www.camo.com
  57. 57. 57 Good models for all groups www.camo.com
  58. 58. 58 Prediction www.camo.com
  59. 59. Prediction on the AciDiSol model A lot of uncertainty on the foreign samples. www.camo.com
  60. 60. 60 Prediction on the MCC model A lot of uncertainty on the foreign samples. MCC is well classified www.camo.com
  61. 61. 61 Inlier vs Hotelling T2 MCC20 is an inlier www.camo.com
  62. 62. 62 Conclusions • MVA can be used for classification / characterization as well as quantification q purposes • Samples are in a group or not or getting a specific predicted value and you get diagnostic tools to understand the results • Diagnostics made at an early stage enable you to correct for deviation and decrease the cost of waste/reproduce. www.camo.com
  63. 63. Conclusions www.camo.com
  64. 64. 64 Objectives and Tools Objective j The Unscrambler X • Process Understanding • Design of Experiments (DoE) • Identification and understanding of  • Statistical Hypothesis Tests raw  materials • Exploratory Data Analysis p y y • Product and Process Development • Regression modelling • Root Cause Analysis • Classification • Prediction of Quality • Prediction Define  Design Analyze Implement Improve www.camo.com
  65. 65. 65 General Conclusions • Multivariate analysis: – gives y a g g you global picture. – is an understanding tool. – is an improving tool. www.camo.com
  66. 66. 66 Benefits • Multivariate analysis in The Unscrambler X benefits: – Team work (project architecture, notes, info) (p j , , ) – Reporting work (informative plots, report generator) www.camo.com
  67. 67. 67 Archived webinars www.camo.com/training/webinars‐seminar.html www.camo.com
  68. 68. 68 Global Presence Head office : Oslo, Norway Oslo Norway Sales Office: Sales Office: Japan Sales Office: Sales Office: Sydney, AU Sales Office: Woodbridge,  NJ R&D:  Bangalore, India Resellers / Distributors www.camo.com
  69. 69. 69 Questions Marion C n marion@camo no Cuny: marion@camo.no www.camo.com

×