Successfully reported this slideshow.
Your SlideShare is downloading.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
  • Be the first to comment

  • Be the first to like this

J&J Thesis Presentation July 2016

  1. 1. Machine Learning Techniques In Categorical Time Series Analysis Of Manufacturing Process Haris Michailidis, Isidora Tourni National Technical University of Athens School of Electrical and Computer Engineering Professor: Nectarios Koziris J&J Responsible: Michalis Avgoulis Presentation Date: 26/07/2016
  2. 2. Contents ● Problem Motivation ● Visualization ● Machine Learning ● Results ● Future Work 2
  3. 3. Introduction 3
  4. 4. Problem Motivation In cooperation with Johnson & Johnson Hellas Goals: ● Visualisation of Mixing Process ● Quantification of Procedures ● Classification & Clustering of processes Further Goals: ● Optimization of the Mixing Process ● Comparison with Golden Standard (Evaluation) ● Comparison between different batches of the same Product 4
  5. 5. Process Description Example Vessel Actions: ● Heating ● Agitation ● Addition of Materials ● Pressure adjustment PLC logging ● Output to CSV MixingRaw Materials Bottling 5
  6. 6. Product Categories 6 Emulsion Product Cleaning Group
  7. 7. Product Categories 7 Picsou C Product Group Apple Cream
  8. 8. Data-Set Description ~130.000 rows/year 45 message code sets (values,set-points) 8 Categorical Data
  9. 9. Visualization 9
  10. 10. Visualization Tool Goals: ● Visualization of Mixing Process ● Selective representation of variables ● Overview with flexible Timeframe ● Accessible from multiple terminals (web interface) 10 The human brain processes visuals60.000 timesfaster than text. * * Forrester CSO Insights 2012
  11. 11. Visualization Tool (interface) 11 Initial Page of the Visualisation Tool
  12. 12. Visualization Tool (interface) 12 4 days overview
  13. 13. Visualization Tool (interface) 13 1 day overview
  14. 14. Visualization Tool (interface) 14 Detailed box in complex visualization
  15. 15. Machine Learning 15
  16. 16. Goal: Explore the possibilities of Machine Learning in Manufacturing space, in order to produce useful insights for the process. ● Classification ● Clustering Challenges: Represent an object in an N-dimensional space ● Representation of each batch | Object Creation ● Data Cleansing / Creation of training set | Labelling ● “Distance” between batches | Distance Calculation Inspiration: ● DNA sequence analysis → Markov Models Machine Learning Introduction 16 Unit of analysis: batch
  17. 17. 1. Data cleansing Value - Set-Point Flattening, Typos Correction 2. Labelling of batches Through 2 files: ● Log file: containing manual entries from operators ● Mapping table: containing information for each product 3. Time-series splitting to batches Depending on business rules which derived from experience and observations. Keep only production chunks. Solving the Challenges 17
  18. 18. Solving the Challenges 4. Feature selection Message Number 5. Representation of each batch 6. Unequal length time-series comparison 18 Message Mapping Table
  19. 19. Transition Matrix Concept 19 Sequence 1 : B-B-C-A-B-C-A-C-A-B-C Sequence 2 : A-A-B-B-A-B-B-C-C-A-B-B-C-A-B-C-A-C A B C A 0.00 0.66 0.33 B 0.00 0.25 0.75 C 1.00 0.00 0.00 A B C A 0.17 0.67 0.17 B 0.14 0.43 0.43 C 0.75 0.00 0.25 Transition Matrix 1 : Transition Matrix 2 :
  20. 20. Solving the Challenges 4. Feature selection Message Number 5. Representation of each batch 6. Unequal length time-series comparison Chunk Object, containing: ● Transition Matrix (fixed size 45x45) ● Labels 7. Distance calculation method Great research area 20 Transition Matrix Message Mapping Table
  21. 21. Distance Evaluation Goal ● Distance {batch - batch} → Distance between 2D Transition Matrices Problems: 1. Choosing the proper Vector Distance Metric 2. Converting 2D Transition Matrix → Vector Solutions: 1. Distance between Vectors: ● Euclidean Distance ● Cosine Distance ● Kullback- Leibler Divergence ● Kolmogorov- Smirnov Test ● Infinite Norm 21
  22. 22. Distance Evaluation 2. 2D Matrix → Vector: * A. Append each row to the first B. Append each row from the diagonal matrix to the first C. Average of distances between corresponding rows A. B. * Not using Space-Filling curves due to unrelated spatial characteristics. 22
  23. 23. Classification (supervised) The process of classifying objects according to shared attributes. Algorithms used: ● Nearest Centroid ● k-Nearest Neighbors Evaluation Methods: ● Accuracy ● Cohen’s Kappa (Kappa coefficient) 23 train test Data
  24. 24. Clustering (unsupervised) The task of grouping objects in such way that objects in the same group (cluster) are more similar to each other than to those in other groups. Algorithms used: ● k-Means Evaluation Methods: ● V-Measure ● Rand-Index 24
  25. 25. Classification Results 25
  26. 26. Distance Comparison | Classification 26
  27. 27. Nearest Centroid Classifier 27 Train - Test Split Evaluation [1/2] Classification Baseline (ZeroR): Product Cleaning Group Accuracy: 0.520 Product Group Accuracy: 0.377 83% 65%
  28. 28. 28 k-Nearest Neighbors Classifier Train - Test Split Evaluation [2/2] Classification Baseline (ZeroR): Product Cleaning Group Accuracy: 0.520 Product Group Accuracy: 0.377 73% 55%
  29. 29. Clustering Results 29
  30. 30. Distance Comparison | Clustering 30 33%
  31. 31. Conclusions 1) Visualization a) Visual Production Overview b) Enabling Comparison between batches 2) Machine Learning a) Valid Representation of Categorical Time-Series b) Quantification of Production Processes c) Application of Machine Learning Techniques 31
  32. 32. Future Work | Academic ● Research on 2D-specific Distance Metrics ● Clustering Algorithms, based on Markov Models ● Classification using Transition Matrices of different Dimensions (Markov- 0,2,...,N) ● Different Feature Selection (temperature, pressure, etc) 32
  33. 33. ● Data Gathering Automation ● Creation of Golden Standard for each Product ● Scoring of Production Process ● Distribution of Batches compared to the Average Batch ● Clustering to more efficient clusters based on the process Future Work | Business 33
  34. 34. Thank you! 34 Questions?
  35. 35. Appendix 35
  36. 36. Distance Comparison | Classification 36 Setup: ● Algorithm: ○ Nearest Centroid Classifier ● Attributes: ○ Product Cleaning Group ○ Product Group ● Split: ○ 80% training set, 20% test set ● Distances: ○ All
  37. 37. Determining k in k-Nearest Neighbors 37 Setup: ● Algorithm: ○ k-Nearest Neighbors ● Attributes: ○ Product Cleaning Group ○ Product Group ● Split: ○ 80% training set, 20% test set ● Distances (Average of): ○ Euclidean total ○ Cosine vector ○ KL - Divergence diagonal
  38. 38. Train - Test Split Evaluation 38 Setup: ● Algorithm: ○ Nearest Centroid Classifier ○ k-Nearest Neighbors ● Attributes: ○ Product Cleaning Group ○ Product Group ● Split (train-test): ○ 80% - 20% ○ 65% - 35% ○ 50% - 50% ● Distances (Average of): ○ Euclidean total ○ Cosine vector ○ KL - Divergence diagonal
  39. 39. Distance Comparison | Clustering 39 Setup: ● Algorithm: ○ Baseline ○ k-Means ● Attributes: ○ Product Cleaning Group ○ Product Group ● Initial Centroid Sets Type: ○ All centroids of each set belong to different clusters (Alldiff) Average of 20 sets ○ All centroids of each set belong to the same cluster (Allsame) Average of 20 sets ● Distances: ○ All
  40. 40. Impact of Initial Centroids 40 Setup: ● Algorithm: ○ Baseline ○ k-Means ● Attributes: ○ Product Cleaning Group ○ Product Group ● Initial Centroid Sets Type: ○ All centroids of each set belong to different clusters (Alldiff) Average of 100 sets ○ All centroids of each set belong to the same cluster (Allsame) Average of 100 sets ○ All centroids of each set belong to a random cluster (Allrandom) Average of 100 sets ● Distances (Average of): ○ Euclidean Total ○ Euclidean Rowl ○ Euclidean Column
  41. 41. Determining k in k-Nearest Neighbors [1/2] 41 Accuracy: Average: 0.727 Deviation: <1% Kappa: Average: 0.531 Deviation: ~2%
  42. 42. Determining k in k-Nearest Neighbors [2/2] 42 Accuracy: Average: 0.560 Deviation: <1% Kappa: Average: 0.391 Deviation: ~1%
  43. 43. Distance Comparison | Classification [2/2] 43
  44. 44. Distance Comparison | Clustering [2/2] 44
  45. 45. Impact of Initial Centroids [2/2] 45
  46. 46. Labelling 1. Data cleansing 2. Labelling of batches Object Creation 3. Time-series splitting to batches 4. Representation of each batch (chunk) 5. Feature selection 6. Unequal length time-series comparison Distance Calculation 7. Distance calculation method Challenges in ML 46
  47. 47. Impact of Initial Centroids 47

    Be the first to comment

    Login to see the comments

Views

Total views

203

On Slideshare

0

From embeds

0

Number of embeds

58

Actions

Downloads

9

Shares

0

Comments

0

Likes

0

×