Quality Control and Improvement in Manufacturing


Published on

AACIMP 2009 Summer School lecture by Gerhard Wilhelm Weber. "Modern Operational Research and Its Mathematical Methods" course.

Published in: Education, Technology
1 Comment
  • Aldonna R. Ambler, CMC, CSP has earned the right to be called THE GROWTH STRATEGIST™. She has won over 2 dozen national and statewide “entrepreneur of the year” awards for the resilient growth of her international businesses across 4 recessions. Her midsized BtoB clients get on…and then stay on…the published lists of the fastest growing privately held companies. She owns and operates a suite of companies that help privately held midsized companies achieve accelerated growth with sustained profitability® through opportunity & resource analysis, strategic planning, executive advisory services, growth financing, and targeted search. 2012 is Ambler’s 8th year hosting a weekly peer-to-peer-to-peer syndicated on line talk show that features interviews with CEOs/Presidents of midsized companies (typically between $20 and 200 Mil/yr) sharing success tips about the growth strategy-of-the-week. An archive of over 300 interviews is available at www.GrowthStrategistShow.com. She can be reached toll free at 1-888-Aldonna or at Aldonna@AMBLER.com. You Can visit her site http://www.ambler.com
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Quality Control and Improvement in Manufacturing

  1. 1. 4th International Summer School Achievements and Applications of Contemporary Informatics, Mathematics and Physics National University of Technology of the Ukraine Kiev, Ukraine, August 5-16, 2009 Quality Control and Improvement in Manufacturing Gülser Köksal , Sinan Kayalıgil Department of Industrial Engineering, METU, Ankara, Turkey Gerhard-Wilhelm Weber, Başak Akteke-Öztürk IAM, METU, Ankara, Turkey
  2. 2. Project Team Gülser Köksal (IE) Nur Evin Özdemirel (IE) Sinan Kayalıgil (IE) Bülent Karasözen (MATH, IAM) Gerhard Wilhelm Weber (IAM) Đnci Batmaz (STAT) Murat Caner Testik (IE) Đlker Arif Đpekçi (IE) Berna Bakır (IS) Fatma Güntürkün (STAT) Başak Öztürk (IAM) Fatma Yerlikaya (IAM) Other Collaborators: Esra Karasakal (IE) Zeev Volkovich (CS - Israel) Adil Bagirov (AOpt - Australia) Özge Uncu (IE- Canada) Pakize Taylan (IAM) Süreyya Özöğür (IAM) Elçin Kartal (STAT) Selcan Cansız (STAT&IE)
  3. 3. OUTLINE Project Objectives Quality Improvement (QI) Data Mining (DM) DM Applications in QI in Literature DM Applications in the Project Casting QI Problem (Decision Trees, Neural Nets, Clustering) Driver Seat Design Problem (Decision Trees) PCB QI Problem (Association) Other approaches Nonlinear/Robust Regression Conclusion
  4. 4. Project Objectives Determine which DM approaches can effectively be used in QI Test performance of DM approaches on selected quality design and improvement problems with especially voluminous data and multiple input and quality characteristics Develop more effective approaches to solve such problems
  5. 5. Project Scope Manufacturing industries keeping records of various input and quality characteristics QI problems for which traditional analysis and solution approaches are ineffective due to too many variables and complicated relationships “Parameter design optimization” and “quality analysis” type of quality problems
  6. 6. The Approach Collect appropriate data from different industries for different quality problems Apply appropriate DM techniques in solving those problems Compare performances of DM techniques Determine which DM techniques can effectively be used for which type of QI problems Develop new / improved algorithms
  8. 8. Quality Control and Improvement Activities Product development stage Quality control and improvement activity Product design Concept design Parameter design (design optimization) Tolerance design Manufacturing process design Concept design Parameter design (design optimization) Tolerance design Manufacturing Quality monitoring Process control Inspection / Screening Quality analysis Customer usage Warranty and repair / replacement
  9. 9. Parameter Design Optimization Static problem: INPUT Find settings of manipulated input for fixed output target Disturbance and minimum variability Unmeasured Measured Dynamic problem: Find settings of manipulated input for changing output targets and minimum variability INPUT Unmeasured PRODUCT/PROCESS OUTPUT Manipulated Measured
  10. 10. Dynamic Manufacturing Environment INPUT Goal: to have process output within target specifications with Disturbance smallest amount of variation around the target (assignable causes, noise) Unmeasured statistical process control to detect assignable causes Measured (quality monitoring) INPUT Unmeasured PROCESS OUTPUT Manipulated Measured engineering process control
  11. 11. Static Manufacturing Environment INPUT Goal: to have process output within target specifications with Disturbance smallest amount of variation around the target Quality analysis: (assignable causes, noise) Unmeasured measured / manipulated input Measured → output INPUT Unmeasured PROCESS OUTPUT Manipulated Measured
  12. 12. Quality Control and Improvement Activities: Quality Analysis Quality Analysis consists of - Finding characteristics critical-to-quality (CTQ) - Finding input variables that significantly affect quality output - Predicting quality - quality output is a real valued variable - finding empirical models that relate input characteristics of quality to output ones - using such models to predict what the resulting quality characteristics will be for a given set of input parameters - Classification of quality - For nominal, binary or ordinal outputs - For a given set of input parameters, predicting the class of the quality output
  13. 13. DATA MINING
  14. 14. Data Mining Data mining (knowledge discovery in databases) : Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns in large databases What is not data mining? (Deductive) query processing Expert systems or small ML/statistical programs
  15. 15. Data mining – A KDD Process Data mining is the core of KDD process Pattern Evaluation Data Mining Task-relevant Data Data Selection Data Preprocessing Data Warehouse Data Cleaning Data Integration Databases
  16. 16. Data Mining Techniques Supervised Learning Classification and regression Decision trees Neural networks Support vector machines Bayesian belief networks Non-linear robust regression Rule induction Association rules Rough set theory
  17. 17. Data Mining Techniques Unsupervised Learning Clustering K-means, Fuzzy C-means, Hierarchical, Mixture of Gaussians Neural Networks (Self Organizing Maps) Outlier and deviation detection Trend analysis and change detection
  18. 18. Some Applications Market research and customer relationship management Risk analysis and management Fraud detection Text and web analysis Intelligent inquiry Process modelling Supply chain management
  19. 19. Supply Chain Management Applications Reducing risk of accepting bad credit cards in payments through e-commerce Controlling inventory by analyzing past business, monitoring present transactions, and predicting future sales Controlling inventory by predicting customer’s behavior patterns (e-commerce) CRM (clustering customers, understanding their needs and behaviors, etc.) Source: Kusiak, A. “Data Mining in Design of Products and Production Systems”, Proceedings in INCOM 2006, Vol.1, 49-53.
  20. 20. SOME DM APPLICATIONS on QI PROBLEMS Predicting quality for given process parameter levels Finding optimal process parameter levels for quality Determining effects of equipment on quality Determining factors / parameters effects on quality Tolerancing Identifing relationships among several quality characteristics Determining assignable causes that make a process out of control (unstable) on time
  21. 21. Some Applications in Literature Integrated circuit manufacturing Fountain et al. (2000), Kusiak (2000) Packaging manufacturing Abajo et al. (2004) Semiconductor wafer manufacturing Gardner (2000), Kusiak (2000), Bae (2005), Chen (2004), Braha (2002), Hu (2004), Dabbas (2001), Fan (2001), Mieno (1999) Skinner (2002) Sheet metal assembly Lian et al. (2002)
  22. 22. Some Applications in Literature Steel production Cser et al. (2001) Chemical manufacturing Shi et al. (2004), Gillblad (2001) Sun (2003) Ultra-precision manufacturing Huang&Wu (2005) Conveyor belts manufacturing Hou et al. (2003), Hou (2004) Plastic manufacturing Ribeiro (2005)
  23. 23. LITERATURE SURVEY (DM Applications on Selected QI Problems) No. of papers 14 2007 12 2006 2005 10 2004 8 2003 2002 6 2001 2000 4 1999 1998 Finding CTQs 2 Predicting quality 1997 Classification of quality Parameter optimization 0 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 0 5 10 15 20 25 Years
  24. 24. Literature Survey (cont.d) RBF-NN BA Finding CTQs 1 1 CC 1 BN 1 GA RSM AHC 1 1 1 KW ANN ANN- BN 11 SVM DT 1 2 7 1 GA 1 ANN-SOM FST 3 3 RST ANN 3 RST 6 DT 5 5 ANOVA 5 R 5 Classification of quality
  25. 25. Literature Survey (cont.d) TM 1 ANN-RBF 1 Predicting quality ANN-RBF 3 ANN-BN 4 ANN 6 FST GA 4 11 DT ANN 4 38 R 13 Parameter optimization
  26. 26. QI Problems – Examples from the Project Casting manufacturing Driver seat design Circuit board manufacturing
  27. 27. CASTING QUALITY IMPROVEMENT PROBLEM – The Company RKN is a casting company having two factories located in Ankara It manufactures intermediate goods for the automotive, agricultural tractor and motor industries RKN applies 6σ methodologies in improving its processes
  28. 28. CASTING QUALITY IMPROVEMENT PROBLEM – Some Products Transmission Cases Engine Block Oil pan Gearbox
  29. 29. CASTING QUALITY IMPROVEMENT PROBLEM – Some Research Questions Is there any relation between defect types and process parameters? Do the important factors for different defect types interact? Which process parameter levels are better in reducing the defects?
  30. 30. DRIVER SEAT DESIGN OPTIMIZATION PROBLEM – The Company TFD is one of the largest automobile manufacturers in Turkey located in Bursa. They would like to improve the design of the driver seat of a commercial vehicle for more customer satisfaction. The driver seat is a critical part of an automobile that affects the buying decision.
  32. 32. DRIVER SEAT DESIGN OPTIMIZATION PROBLEM – Some Research Questions Which customer features do affect overall satisfaction from the seat? What are the characteristics of highly satisfied /dissatisfied customers from the seat? Which features of the seat do affect overall satisfaction from the seat?
  33. 33. CIRCUIT BOARD QUALITY IMPROVEMENT PROBLEM – The Company VPC is one of the largest electronic equipment manufacturers in Turkey. They produce approximately 35-40 thousand PCBs per day, and 1.5-2 million PCBs per month. 70-80 thousand PCBs are scrapped every month. They would like to minimize PCB failures.
  34. 34. CIRCUIT BOARD QUALITY IMPROVEMENT PROBLEM – The Products Final products: DVD player/recorder, DivX player, AV receiver, digital satellite receiver, digital TV receiver, digital media adapter Component of interest: Various PCBs (Printed Circuit Boards)= Board+Integrated Circuits+Resistors+Capacitors+ Diots
  35. 35. CIRCUIT BOARD QUALITY IMPROVEMENT PROBLEM – Some Research Questions Which defect types do occur together? What are the root causes of the defects? Do suppliers affect the defects? Do defects occur at certain locations on the board?
  36. 36. Data Mining Software Used in the Project SPSS Clementine Matlab Statistica QC Miner MARS
  37. 37. Decision Trees
  39. 39. RKN’s Quality Objectives Decrease percentage of defective items by choice of process parameters Priorities: products suffering from high percentage of defects products of larger share in the total tonnage although with lower percent defectives Decrease percentage of products returns because of the defects determined by customers
  40. 40. Objectives Decrease the proportion of defective items (to a certain target value) Identify the most important process parameters affecting quality Finding the ranges of these parameters to operate (future direction) Optimizing the proportion of defective items (future consideration)
  41. 41. Perkins021 Cylinder Head Perkins 021 cylinder head is one of the two products chosen for the analysis from the second casting plant Reason: Having problems with Perkins Cylinder Head Availability of the data Volume of the data
  42. 42. Data Collection Data in RKN come from several processes and different time periods. Weekly Daily Hourly Most of the data come from Core shop Molding Melting
  43. 43. Data Collection (Cont...) Lot: total production in a day (one or more shifts) Daily records consist of the total volume of production, total count of defective products and the distribution of defect types Response variables recorded are: total number of defective products number of defective products for 19 defect types number of defective products returned by the customer (newly added)
  44. 44. Data of Core Shop Cores are produced according to a weekly production plan Cores used for a product are ready one or two days before use Specific core usage in a shift cannot be identified accurately Production may stop for a while and even the cores from 3 or more days in the past can be put to use arbitrarily
  45. 45. The Data 5 month’s production data Number of records : 95 (averages of 95 days) Input : real (47) Output : discrete (8) Can be transformed to binary, nominal or ordinal variables if needed Some missing data AFTER PREPROCESSING 6 real uncorrelated response variables (proportions of defect types) + 1 total response (proportion of defective items) 36 real feature (predictor) variables 92 observations
  46. 46. Problem Settings k Đ features responses x1 x2 y1 y2 126,00 135,00 1 0 120,00 140,00 1 0 110,00 120,00 1 0 102,00 131,00 1 0 130,00 125,00 1 0 285,00 115,00 0 0 296,00 140,00 0 0 275,00 129,00 0 0 260,00 128,00 0 0 Univariate j 280,00 106,00 105,00 306,00 0 0 0 1 Modeling obs. 113,00 308,00 0 1 vs 122,00 306,00 0 1 128,00 329,00 0 1 Multivariate 145,00 334,00 0 1 287,00 329,00 1 1 Modeling 279,00 324,00 1 1 291,00 335,00 1 1 260,00 340,00 1 1 270,00 321,00 1 1
  47. 47. Univariate Decision Tree Methodology – CART (Continuous data) DECISION TREE MODEL (LEAST) SQUARE DEVIATION 1 R (t ) = ∑ (y i − y ( t )) 2 N (t ) i∈ t IMPURITY MEASURE Φ(s,t) = R(t) − pLR(tL ) − pRR(tR ) A TYPICAL RULE GENERATED IF X 22 > 13 .275 AND X 9 > 3 . 095 THEN % Y 6 = 0 .006 ( Support = 48 / 92 )
  48. 48. Research Questions Can we reduce problem dimension by extracting important features only? Is there any relation between defect types and process parameters? Do the important factors for different defect types interact? Are there significant changes in process parameter when a defect rate is high or low? Which process parameter levels are better in reducing the defects? Is there any period when high defect rates occur specifically? Is there any pattern in the sequence of defect type occurences?
  49. 49. Feature Reduction Feature selection Decision trees PCA
  50. 50. Univariate Decision Tree Methodology – Nominal data Number of records: 748 Analysis Accuracy: 93.45% inputs: x32, x12, x22, x13, x2, x19, x10, x9, x36, x8, x28 Tree depth: 9 Results for output field y Comparing $C-y with y 'Partition' 1_Training 2_Testing Correct 699 93.45% 294 92.74% Wrong 49 6.55% 23 7.26% Total 748 317 Coincidence Matrix for $C-y (rows show actuals) 'Partition' = 1_Training 0.000000 1.000000 2.000000 0.000000 49 0 3 %94.2 1.000000 0 224 19 %92.1 2.000000 0 27 426 %94 'Partition' = 2_Testing 0.000000 1.000000 2.000000 0.000000 18 0 2 1.000000 0 115 4 2.000000 0 17 161
  51. 51. Conclusion of the Casting Work DT induced rules were instrumental in planning new controlled experiments Process optimization may be sought based upon these field experiments DT induced rules may also be used to set tolerance levels for the uncontrollable features (variables)
  52. 52. Suggested Factor Levels Pertinent Fact contoll Adjusted Suggested Defect or able? Setting Observed Range Trial Range Types Suggested Mean Setting x2 H [15, 30] [20, 28] [23, 28] (y2),(y3),(y6),(y8) mümkünse [23, 28] x3 H [15, 30] [30, 40] [31, 37.5] y1,y3 mümkünse [31, 37.5] x4 E [13, 15] [12.171, 13.678] [12.295, 13.678] y1 sabit [12.295, 13.678] x5 E [14, 16] [12.27, 13.66] [12.27, 13.165] y8 sabit [12.27, 13.165] x6 E [7.5, 9.5] [7.585, 8.25] [7.917, 8.25] y8 sabit [7.917, 8.25] x8 E [35, 42] [21.75, 42] [21.75, 35] y3, (y2) sabit [21.75, 35] x9 E [3, 3.5] [2.98, 3.387] yok y2, y3, y6, y8 3 seviye [3.183, 3.216], [3.216, 3.26], [3.26, 3.387] x11 E [18, 23] [19.8, 22.9] [20.339, 22.9] y3 sabit [20.339, 22.9] x12 E [250, 400] [290, 360] [350, 360] y2 sabit [350, 360], olmazsa [305, 360] x14 E [3.5, 5.5] [4.7, 5.2] [4.724, 5.2] y2 sabit [4.724, 5.2] x16 H [11, 23] [13.2, 30] [15.86, 30] y1, (y2) mümkünse [15.86, 30] x17 H [11, 23] [15.9, 31.5] [26.55, 31.5] y1 mümkünse [26.55, 31.5] x19 H [11, 23] [14.1, 24.9] yok y2 kendi seyrine bırakılacak x20 E 40 [38.992, 42.85] [38.992, 41.32] y3 sabit [38.992, 41.32] x21 E 50 [48.68, 52.71] [49.181, 52.71] y9 sabit [49.181, 52.71] 28 marta kadar = 12 28 marta kadar: [10.85, 14,35] 4 seviye [10.85, 13.125], [12.275, 14.35], [14.35, x22 E 31 marttan sonra = 22 31marttan sonra: [20.05, 33.428] yok y1,y2,y3,y6 17.2], [17.2, 33.42] x25 H aralık yok [2.5, 6.9] [2.5, 6.533] y8 mümkünse [2.5, 6.533] x26 E [1420, 1430] [1367.59, 1428.23] [1367.59, 1425.98] y8, y9 sabit [1367.59, 1425.98] x27 H aralık yok [2.259, 4.95] [2.259, 4.2] y2, (y3) mümkünse [2.259, 4.2] x28 H aralık yok [11.7, 16.9] yok y3, y6 kendi seyrine bırakılacak y1,y3,y6, 3 levels [3.208, 3.304], x29 YES [3.2, 3.35] [3.208, 3.41] NOT AVAIL y8 [3.304, 3.325], [3.355, 3.41] x30 E [1.85, 2] [1.823, 2] yok y1,y2,y3 2 seviye [1.823, 1.88], [1.88, 2] x32 E [0.2, 0.3] [0.171, 0.283] yok y1,y2 2 seviye [0.171, 0.184], [0.184, 0.283] June 2007 x33 E maximum 0.3 [0.0767, 0.552] METU-IE and[0.174, 0.552] Workshop TU/e-OPAC y2 sabit [0.174, 0.552] x35 E [0.08, .12] [0.0762, 0.1122] [0.088, 0.1122] y1 sabit [0.088, 0.1122]
  53. 53. DRIVER SEAT DESIGN OPTIMIZATION PROBLEM Questionnairre data 80 observations/subjects 28-88 input variables (age, sex, distance travelled, anthropometric measures, ease of use, attractives, etc.) 1-53 output variables (back comfort, tigh comfort, overall satisfaction, ease of use, attractiveness, etc.)
  54. 54. Rules for customer satisfaction Rule for 7 / 7 (very satisfied) (support=4; confidence=1.0) If Lumbar ache after driving for a long time = 0 and Video gray as a seat cover design = 1 and Accept to pay more for the seat belt sensor = 0 and Adequate support by the seat cushion = 1 then 7,0 (very satisfied) Rule for 6 / 7 (satisfied) (support=10; confidence=1.0) If Lumbar ache after driving for a long time = 0 and Video gray as a seat cover design = 1 and Accept to pay more for the seat belt sensor = 0 then 6,0 (satisfied) Rule for 4 / 7 (normal) (support=8; confidence=0.75) If Lumbar ache after driving for a long time = 0 and Easy reach to the lumbar support adjustment =0 then 4.0 (normal)
  55. 55. Neural Network Modeling
  56. 56. Neural Network Modeling - General A neural network (NN) is an interconnected group of artificial neurons that uses a mathematical or computational model for information processing based on a connectionist approach to computation. Incorporates learning rather than programming and parallel rather than sequential processing. Neural networks resemble the human brain in two respects: The network acquires knowledge from its environment using a learning process (algorithm) Synaptic weights, which are inter-neuron connection strengths, are used to store the learned information.
  57. 57. General Topology Hidden layers Output layer Input layer
  58. 58. Inside the Node A node Components: Receives n-inputs Weights Compute net input according to base Base function (summing unit) function Activation function Applies activation function to the net input Bias Outputs result b x1 w1 Activation function net Output x2 . w2 ∑ f(net) y Input Base values . function . nodei Xm wm weights
  59. 59. Properties Capabilities Fault tolerance Robustness Non-linear mapping Learning and generalization Optimization Issues Number of source nodes Number of hidden layers Number of hidden nodes per hidden layer Training data (Too much…..overfitting, too little……inaccurate classification) Number of classes(sink) Interconnections Activation function Learning technique Stopping criteria
  60. 60. Application 1: Classification of quality in Casting Data: 36 input variable (continuous) 1 output variable (categorical with 3 levels – 1: first defect type exists, 2: second defect type exists, 0: none of these two defect types exist) Partition: Training -> 70%, Testing -> 30% Learning rule: Back-propagation Network Topology Input layer (36 neurons) Hidden layer (6 neurons) Output layer (1 neuron) To prevent overfitting, training set was divided again into training and testing set (partitioning the partition), trained on training set, and error is evaluated on the test set at each cycle
  61. 61. Results COINCIDENCE MATRIX FOR PREDICTED CATEGORIES Overall predicted accuracy Training 0 1 2 0 33 0 3 Training: 92,56% 1 0 158 13 Testing: 87,01% 2 0 27 344 Testing 0 18 0 0 1 0 51 11 2 0 19 132 GAIN CHART
  62. 62. Application 2: Prediction of quality in Casting Data: 36 input variable (continuous) 1 output variable (percentage of defectives for a certain defect type) Partition: Training -> 70%, Testing -> 30% Learning rule: Back-propagation Method: Exhaustive prune (finds the best topology) Final Network Topology Input layer (36 neurons) First hidden layer (25 neurons) Second hidden layer (17 neurons) Output layer (1 neuron)
  63. 63. Results Estimated accuracy: 99.95% Training results are slightly better than testing results (overfitting) Statistics
  64. 64. Conclusion Neural networks can be used for both classification and prediction Unlike decision trees, neural networks are black-box models To decide on best production regions, further study may be needed (simulation, DOE, etc).
  65. 65. CLUSTERING
  66. 66. CLUSTERING - General Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates the clustering of balls we see clustering is grouping data or dividing a large data set into smaller data sets of some similarity.
  67. 67. Clustering Algorithms A clustering algorithm attempts to find natural groups of components (or data) based on some similarity Clustering algorithms find k clusters so that the objects of one cluster are similar to each other whereas objects of different clusters are dissimilar.
  68. 68. Taxonomy of Clustering Approaches
  69. 69. Hierarchical vs. Partitional A hierarchical algorithm partitions the data set in a nested manner into clusters which are either disjoint or included one into another. These algorithms are either agglomerative or divisive according to the algorithmic structure and the operation they carried on. A partitional method assumes that the number of clusters to be found is already given and then it looks for the optimal partition based on the objective function.
  70. 70. Nonsmooth Optimization Most cases of clustering problems are reduced to solving nonsmooth optimization problems. Nonsmooth Optimization Problem: minimize subject to : is nonsmooth at many points of interest does not have a conventional derivative at these points. A less restrictive class of assumptions for than smoothness: convexity and Lipschitzness.
  71. 71. Cluster Analysis via Nonsmooth Opt. Given instances Problem: This is a clustering problem with the partitioning method. We will reformulate this as a nonsmooth optimization problem.
  72. 72. Cluster Analysis via Nonsmooth Opt. Cont’d k is the number of clusters (given), m is the number of instances (given), is the j-th cluster’s center (to be found), association weight of instance , cluster j (to be found): ( ) is an matrix, objective function has many local minima.
  73. 73. Cluster Analysis via Nonsmooth Opt. Cont’d if k is not given a priori Start from a small enough number of clusters k and gradually increase the number of clusters for the analysis until a certain stopping criteria is met. This means: If the solution of the corresponding optimization problem is not satisfactory, the decision maker needs to consider a problem with k + 1 clusters, etc.. This implies: One needs to solve repeatedly arising optimization problems with different values of k - a task even more challenging.
  74. 74. Cluster Analysis via Nonsmooth Opt. Cont’d Reformulated Problem: • A complicated objective function: nonsmooth and nonconvex. The number of variables in the reformulated nonsmooth optimization problem above is k×n, before it was (m+n)×k. • This problem can be solved by related nonsmooth methods (e.g., Semidefinite Programming, discrete gradient method).
  75. 75. Clustering Analysis on RKN Casting Data We used k-means, PAM (Partitioning Around Medoids) and k- means improved by Nonsmooth Optimization to identify homogenous groups in the data. k-Means: The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid. PAM: A medoid is an object of the cluster, whose average distance to all the objects in the cluster is minimal. k-Means improved by Nonsmooth Optimization: k-means algorithm that solves a nonsmooth optimization subproblem for calculating the starting point for the k-th cluster center.
  76. 76. Results k-Means: k=2, cluster 1: 70 obj., cluster 2: 22 obj. k=3, cluster 1: 68 obj., cluster 2: 22 obj., cluster 3: 2 obj. k=4, cluster 1: 68 obj., cluster 2: 16 obj., cluster 3: 6obj., cluster 4: 2 obj. PAM: k=2, cluster 1: 40 obj., cluster 2: 52 obj. k=3, cluster 1: 33 obj., cluster 2: 34 obj., cluster 3: 25 obj. k=4, cluster 1: 20 obj., cluster 2: 34 obj., cluster 3: 25 obj., cluster 4: 13 obj. k-means improved by Nonsmooth Optimization: k=2, cluster 1: 61 obj., cluster 2: 31 obj. k=3, cluster 1: 61 obj., cluster 2: 31 obj., cluster 3: 2 obj. k=4, cluster 1: 45 obj., cluster 2: 24 obj., cluster 3: 2 obj., cluster 4: 21 obj.
  77. 77. Results PAM Clusters 1 2 3 4 Total K-Means 1 20 12 25 13 70 Clusters 2 0 22 0 0 22 Total 20 34 25 13 92 k-means improved by Nonsmooth Optimization Clusters Total 1 2 k-Means 1 61 9 70 Clusters 2 0 22 22 Total 61 31 92
  78. 78. Results In the tables above, we showed the relations between different clustering results. Optimal partitioning with PAM is obtained for k=4, however for others k=2 gives the best results. For k=3 and k=4 with k-means, the clusters of 2 and 6 objects are artificial. These results match with our preprocessing studies (Cathrene Sugar’s “jump method” and PCA) which suggested that k is 2 or 4 in our data.
  79. 79. Jump Method and PCA Transformed distortion Cluster
  80. 80. Association Rule Mining
  81. 81. Association Analysis Association rule mining searches for interesting relationships among the features in a given data set. A typical example of association rule mining is “market basket analysis”. This process analyzes customer buying habits by finding associations between the different items that customers place in their “shopping baskets”
  82. 82. Support and Confidence • Association rules are statements in the form of IF antecedent(s) THEN consequent(s) where antecedent(s) and consequent(s) are disjoint conjunctions of feature-value pairs. • Two common measures, support and confidence, are used to evaluate extracted rules • For a rule defined as X=>Y • The support of the rule is the joint probability of X and Y, Pr(X and Y). • The confidence of the rule is the conditional probability of Y given X, Pr(Y|X)
  83. 83. PCB Assembly Line
  84. 84. PCB Assembly Line (Cont.)
  85. 85. PCB Manufacturing Data in Transactional Format In this format, a single board can be seen in more than one rows, each of which represent different operation performed on this product Serial number can be used as the transaction ID which distinguishes different products Attributes (variables) of the boards: Product type Description of the failure (failure observed during the final electrical test) Root cause (cause of the failure identified during the repair) Location of the root cause Board type Supplier Operation line failure is detected Date and time
  86. 86. Attributes 11 types of PCB 38 possible failures (e.g., display error, software error, no audio, etc.) 13 possible root causes (e.g., chip without solder, resistance is upright, short circuit, etc.) Location of the root cause on the board 9 board types 6 different suppliers
  87. 87. Application: PCB Manufacturing Sample records from PCB manufacturing data Board Type serial supplier Failure reason-of-failure Location 1 2459 GOODBOARD display error no solder U45 6.PIN 1 736 TATCHUN-GIA TZOONG AUX1 error short circuit U8 2.PIN 4 990 GIA TZOONG device-not-work sw L71 3 700 TATCHUN-GIA TZOONG display error short circuit R407 6 712 ÜNAL ELEKTRONĐK rgb-cvbs error flash error R412 2 1411 GOODBOARD sw error upright K23 2 663 GOODBOARD-TATCHUN AUX1 error no solder C130 7 627 UNIWELL ELECTRONIC audio error upside-down B353 4 1169 GOODBOARD sw error sw U6
  88. 88. Possible Applications of Association Analysis Identifying failure types taken place on the same board together. Association of failures with root cause. Association of failures with suppliers. Identifying failures occuring in sequence. Association of failures with the location of the root cause on the board
  89. 89. Identifying failure types occured on the same board together “device-not-functioning” => “flash- not-loading” (%25, %73) “flash-not-loading” => “display error” (%36, %86) “AUX1 error” AND “feed error” => “ audio error” (%32, %61)
  90. 90. Association of failures with root causes “upright” AND “Location” = Chip => “audio error” (%46, %82) “no solder” => “device-not-functioning” (%18, %100)
  91. 91. Association of failures with suppliers “GOODBOARD” => “display error” (%23, %57) “UNIWELL” AND “GOODBOARD” => “feed error” (%18, %53)
  92. 92. Identifying failures dependent on the sequence of operations Line 1 = “AUX1 error” => Line 5 = “feed error” (% 22, % 48)
  93. 93. Association of failures with the location of the root cause on the board “device-not-functioning” => Location = “resistance” (%56, %76) “flash-not-loading” => Location = “U8 2.PIN” (%43, %66)
  94. 94. Regression
  96. 96. CONCLUSION Tough QI problems with several input and output variables can be handled effectively with DM approaches. Observational or experimental data, preferentially voluminous data are needed. Online data collection systems might need to be installed Data quality and pre-processing are crucial Many tools seem to be difficult to apply in practice for industry people (advanced training might be necessary) Results in the form of rules are found useful and interesting by the industry
  97. 97. FUTURE WORK Continue collecting different data sets for different QI problems, and applications on them Also apply other DM approaches such as linear / robust regression, fuzzy clustering / regression and rough set theory. Compare performances. Develop new / improved DM algorithms for solving the QI problems. Multi-response decision tree modeling Non-smooth optimization for categorical quality responses Improved MARS with Tikhonov regularization
  98. 98. PAPERS AND PRESENTATIONS FROM THE PROJECT Bakır, B., Batmaz, Đ., Güntürkün, F.A., Đpekçi, Đ.A., Köksal, G., and Özdemirel, N.E., Defect Cause Modeling with Decision Tree and Regression Analysis, Proceedings of XVII. International Conference on Computer and Information Science and Engineering, Cairo, Egypt, December 08-10, 2006, Volume 17, pp. 266-269, ISBN 975-00803-7-8. Đpekçi, A.Đ., Bakır, B., Batmaz, Đ., Testik, M.C., and Özdemirel, N.E., Defect Cause Modeling with Data Mining: Decision Trees and Neural Networks, to appear in Proceedings of 56th Session of the 1st International Statistical Institute, Lisbon, Potugal, August 22-29, 2007. Akteke-Öztürk, B. and Weber, G. W., "A Survey and Results on Semidefinite and Nonsmooth Optimization for Minimum Sum of Squared Distances Problem", Technical Report, 2007. Öztürk-Akteke, B., Weber, G.W., Kayalıgil, S., Kalite Đyileştirmede Veri Kümeleme: Döküm Endüstrisinde Bir Uygulama, Yöneylem Araştırması ve Endüstri Mühendisliği 27. Ulusal Kongresi (YA/EM 2007), Đzmir, Türkiye, Temmuz 02-04, 2007.
  99. 99. PAPERS AND PRESENTATIONS FROM THE PROJECT (cont.d) Session TC-38: Tutorial Session: Data Mining Applications in Quality Improvement 22nd European Conference on Operational Research, Prague, July 7-11, 2007 Köksal, G., Testik, M.C., Güntürkün, F.A., Batmaz, Đ., Data Mining Applications in Quality Improvement: A Tutorial and a Literature Review Đpekçi, A.Đ., Köksal, G., Karasakal, E., Özdemirel, N.E., Testik, M.C., Multi Response Decision Tree Approach Applied To A Discrete Manufacturing Quality Improvement Problem
  100. 100. PAPERS AND PRESENTATIONS FROM THE PROJECT (cont.d) Köksal, G., Testik, M.C., Güntürkün, F.A., Batmaz, Đ., Kalite Đyileştirmede Veri Madenciliği Yaklaşımları ve Bir Uygulama, 16th National Quality Congress, November 12, 2007, Đstanbul.