Successfully reported this slideshow.
Your SlideShare is downloading. ×

Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Case of Apache Spark for Semiconductor Wafers from Real Industry

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 56 Ad

Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Case of Apache Spark for Semiconductor Wafers from Real Industry

Download to read offline

As the development of semiconductor devices, manufacturing system leads to improve productivity and efficiency for wafer fabrication. Owing to such improvement, the number of wafers yielded from the fabrication process has been rapidly increasing. However, current software systems for semiconductor wafers are not aimed at processing large number of wafers. To resolve this issue, the BISTel (a world-class provider of manufacturing intelligence solutions and services for manufacturers) tries to build several products for big data such as Trace Analyzer (TA) and Map Analyzer (MA) using Apache Spark. TA is to analyze raw trace data from a manufacturing process. It captures details on all variable changes, big and small and give the traces' statistical summary (i.e.: min, max, slope, average, etc.). Several BISTel's customers, which are the top-tier semiconductor company in the world use the TA to analyze the massive raw trace data from their manufacturing process. Especially, TA is able to manage terabytes of data by applying Apache Spark's APIs. MA is an advanced pattern recognition tool that sorts wafer yield maps and automatically identify common yield loss patterns. Also, some semiconductor companies use MA to identify clustering patterns for more than 100,000 wafers, which can be considered as big data in the semiconductor area. This talk will introduce these two products which are developed based on the Apache Spark and present how to handle the large-scale semiconductor data in the aspects of software techniques.

Speakers: Seungchul Lee, Daeyoung Kim

As the development of semiconductor devices, manufacturing system leads to improve productivity and efficiency for wafer fabrication. Owing to such improvement, the number of wafers yielded from the fabrication process has been rapidly increasing. However, current software systems for semiconductor wafers are not aimed at processing large number of wafers. To resolve this issue, the BISTel (a world-class provider of manufacturing intelligence solutions and services for manufacturers) tries to build several products for big data such as Trace Analyzer (TA) and Map Analyzer (MA) using Apache Spark. TA is to analyze raw trace data from a manufacturing process. It captures details on all variable changes, big and small and give the traces' statistical summary (i.e.: min, max, slope, average, etc.). Several BISTel's customers, which are the top-tier semiconductor company in the world use the TA to analyze the massive raw trace data from their manufacturing process. Especially, TA is able to manage terabytes of data by applying Apache Spark's APIs. MA is an advanced pattern recognition tool that sorts wafer yield maps and automatically identify common yield loss patterns. Also, some semiconductor companies use MA to identify clustering patterns for more than 100,000 wafers, which can be considered as big data in the semiconductor area. This talk will introduce these two products which are developed based on the Apache Spark and present how to handle the large-scale semiconductor data in the aspects of software techniques.

Speakers: Seungchul Lee, Daeyoung Kim

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Case of Apache Spark for Semiconductor Wafers from Real Industry (20)

Advertisement

More from Databricks (20)

Recently uploaded (20)

Advertisement

Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Case of Apache Spark for Semiconductor Wafers from Real Industry

  1. 1. Seungchul Lee, sclee@bistel.com BISTel Inc. Daeyoung Kim, dykim3@bistel.com BISTel Inc. Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Case of Apache Spark for Semiconductor Wafers from Real Industry #UnifiedAnalytics #SparkAISummit
  2. 2. Contents 2#UnifiedAnalytics #SparkAISummit • Introduction to BISTel – BISTel’s business and solutions – Big Data for BISTel’s smart manufacturing • Use cases of Apache Spark in manufacturing industry – Trace Analyzer (TA) – Map Analyzer (MA)
  3. 3. Introduction to BISTel 3#UnifiedAnalytics #SparkAISummit
  4. 4. BISTel’s business areas • Providing analytic solutions based on Artificial Intelligence (AI) and Big Data to the customers for Smart Factory 4#UnifiedAnalytics #SparkAISummit
  5. 5. BISTel’s solution areas • World-Class Manufacturing Intelligence through innovation 5#UnifiedAnalytics #SparkAISummit
  6. 6. BISTel’s analytic solution: eDataLyzer 6#UnifiedAnalytics #SparkAISummit
  7. 7. BISTel’s analytic solutions (MA) 7#UnifiedAnalytics #SparkAISummit • Map Pattern Clustering – Automatically detect and classify map patterns with/without libraries – Process thousands of wafers and give results in few minutes Clustered Defective wafers
  8. 8. BISTel’s analytic solutions (TA) • Specialized Application for Trace Raw Data – Extracts the vital signs out of equipment trace data – Provide in-depth analysis which traditional methods cannot reach 8#UnifiedAnalytics #SparkAISummit Abnormal Normal
  9. 9. BISTel’s big data experiences 9#UnifiedAnalytics #SparkAISummit
  10. 10. BISTel’s big data experiences 10#UnifiedAnalytics #SparkAISummit - YMA Test using Spark - - Big data platforms comparison-
  11. 11. Trace Analyzer (TA) 11#UnifiedAnalytics #SparkAISummit
  12. 12. Trace Data • Trace Data is sensor data collected from processing equipment within a semiconductor fab during a process run. 12#UnifiedAnalytics #SparkAISummit - Semiconductor industry - - Wafer -
  13. 13. Logical Hierarchy of the trace data 13#UnifiedAnalytics #SparkAISummit Wafer Lot Recipe Step Recipe Process Visualization Whole process Process Recipe 1 wafer
  14. 14. An example of the trace data 14#UnifiedAnalytics #SparkAISummit Process Recipe Recipe step Lot Wafer Param1 Param2 Time 021_LIT RecipeA 1 1501001 1 32.5 45.4 2015-01-20 09:00:00
  15. 15. Data attributes • Base unit : one process and one parameters • 1000 wafers • Each wafer has 1000~2000 data points in a recipe step • Some factors that make trace data huge volume • # of parameters • # of processes • # of wafers • # of recipe steps • duration of the recipe step 15#UnifiedAnalytics #SparkAISummit
  16. 16. An example of the trace data – (2) 16#UnifiedAnalytics #SparkAISummit No. Fab # of processes # of recipe steps Avg. Recipe ProcessTime Data Frequency # of units Parameter per unit (max) 1 Array 109 10 16 mins 1Hz 288 185 2 CF 25 5 1min 1Hz 154 340 3 CELL 12 7 1min 1Hz 213 326 4 MDL 5 12 2mins 1Hz 32 154 • Some calculations • For one process, one parameter and one wafer • 16 * 10 * 60 sec * 1Hz = 9600 points • Multi parameters, multi processes and multi wafers • 9600 * 288 *185 * 109 * (# of wafers)
  17. 17. Spark : Smart manufacturing • Spark is a best way to process big data in batch analytics • Distributing data based on parameter is suitable for using Apache Spark. • Easy deployment and scalability when it comes to providing the solutions to our customers 17#UnifiedAnalytics #SparkAISummit
  18. 18. 18#UnifiedAnalytics #SparkAISummit Naïve way: applying spark to TA
  19. 19. How to apply Spark to TA? traceDataSet = config.getTraceRDDs().mapToPair(t->{ String recipeStepKey = TAUtil.getRecipeStepKey(t); #use recipe step as key return new Tuple2<String,String>(recipeStepKey,t); }).groupByKey(); traceDataSet.flatMap(t->{ Map<String,TraceDataSet> alltraceData = TAUtil.getTraceDataSet(t); ... TAUtil.seperateFocusNonFocus(alltraceData,focus,nonFocus); #separate data ta.runTraceAnalytic(focus,nonFocus,config); # calling the TA core ... });
  20. 20. Most cases in manufacturing industry • In real industry, most parameters have small number of data points. (Most case : 1Hz) • In addition, the number of wafers to be analyzed is not massive. (up to 1,000 wafers) • Therefore the total number of data points in a process can be easily processed in a core
  21. 21. Issues in manufacturing industry 21#UnifiedAnalytics #SparkAISummit • Last year, I have got an email indicating that..
  22. 22. Big parameter 22#UnifiedAnalytics #SparkAISummit • Tools with high frequency or high recipe time can produce huge volume for single parameter • Requirements in industry • For one parameter • 400,000 wafers • 20,000 data points.
  23. 23. Limitations of the Naïve TA 23#UnifiedAnalytics #SparkAISummit For(Tuple<String,Iterable<String> recipeTrace : allTraceData){ TraceDataSet ftds = new TraceDataSet(); Iterable<String> oneRecipe = recipeTrace._2(); for(String tr : oneRecipe){ TraceData td = TAUtil.convertToTraceData(tr); ftds.add(td); } } traceDataSet = config.getTraceRDDs().mapToPair(t->{ String recipeStepKey = TAUtil.getRecipeStepKey(t); #use recipe step as key return new Tuple2<String,String>(recipeStepKey,t); }).groupByKey(); All the data points based on the key are pushed into one core by shuffling Java object holds too many data points
  24. 24. Needs for new TA spark 24#UnifiedAnalytics #SparkAISummit • Naïve TA Spark version cannot process massive data points. • Nowadays, new technology enhancements enable data capture at much higher frequencies. • TA for “big parameter” version is necessary.
  25. 25. Our idea is that.. 25#UnifiedAnalytics #SparkAISummit • Extracting the TA core logic – Batch mode – Key-based processing – Using .collect() to broadcast variables – Caching the object
  26. 26. • Preprocessing trace data • Key-based processing • Base unit : process key or recipe step key Batch 26#UnifiedAnalytics #SparkAISummit JavaPairRDD<String, List<String>> traceDataRDD = TAImpl.generateBatch(traceData) First element : process, recipe step, parameter and batch ID Second element : lot, wafer and trace values Summary statistics . . . •Param A
  27. 27. Collect() : TA Cleaner 27#UnifiedAnalytics #SparkAISummit • Filtering out traces that have unusual duration of process time. • Use the three main Spark APIs – mapToPair : extract relevant information – reduceByKey : aggregating values based on the key – collect : send the data to the driver
  28. 28. Collect() : TA Cleaner – (2) 28#UnifiedAnalytics #SparkAISummit Worker wafer value 1 65 2 54 … … Worker wafer value 1 83 2 54 … … Worker wafer value 1 34 2 77 … … Worker wafer value 1 71 2 80 … … • traceData.mapToPair() • Return • key : process • value : wafer and its length
  29. 29. Collect() : TA Cleaner – (3) 29#UnifiedAnalytics #SparkAISummit • reduceByKey() • Aggregating contexts into one based on the process key wafer value 1 65 2 54 … … Shuffling wafer value 1 88 2 92 … … wafer value 1 153 2 146 … …
  30. 30. Collect() : TA Cleaner – (4) 30#UnifiedAnalytics #SparkAISummit • Applying filtering method in each worker mapToPair(t -> { String pk = t._1(); Double[] values = toArray(t._2()); FilterThresdholds ft = CleanerFilter.filterByLength(values); return Tuple(pk,ft); }).collect();
  31. 31. Examples 2 : Computing outlier 31#UnifiedAnalytics #SparkAISummit • To detect the outlier in a process, median statistics is required. • To compute the median value, the values need to be sorted. • Sort(values)
  32. 32. Examples 2 : Computing outlier – (2) 32#UnifiedAnalytics #SparkAISummit mapToPair reduceByKey • Computed the approximate median value for big data processing. • Applied histogram for median • Collecting the histogram Collect
  33. 33. Caching the trace data 33#UnifiedAnalytics #SparkAISummit • Persist the trace data before applying TA algorithm • Be able to prevent data load when the action is performed Focus=Focus.persist(StorageLevel.MEMORY() AND DISK()) NonFocus=NonFocus.persist(StorageLevel.MEMORY() AND DISK())
  34. 34. RDD vs. DataSet (DataFrame) 34#UnifiedAnalytics #SparkAISummit • RDD – All the data points in a process should be scanned • Advantage of the DataSet is weakened. – Hard to manipulate trace data using SQL – Basic statistics (i.e. Min, Max, Avg, Count…) – Advanced algorithm (Fast Fourier Transform and Segmentation)
  35. 35. Demo : Running the TA algorithm 35#UnifiedAnalytics #SparkAISummit • Analyzed 2TB trace data using TA
  36. 36. TA results in eDataLyzer 36#UnifiedAnalytics #SparkAISummit
  37. 37. Results of the Naïve TA 37#UnifiedAnalytics #SparkAISummit
  38. 38. Results of the big parameter TA Spark 38#UnifiedAnalytics #SparkAISummit
  39. 39. • Two different TA Spark versions Two different TA Spark versions 39#UnifiedAnalytics #SparkAISummit Data size # of parameter # of wafers # of data points Running Time Naïve TA 2TB 270,000 250 1000 1.1h Big Param TA 1TB 4 400,000 20,000 54min
  40. 40. Map Analyzer (MA) 40#UnifiedAnalytics #SparkAISummit
  41. 41. Map Analytics (MA) 41#UnifiedAnalytics #SparkAISummit • Hierarchical clustering is used to find a defect pattern S.-C. Hsu, C.-F. Chien / Int. J. Production Economics 107 (2007) 88–103
  42. 42. MA datasets 42#UnifiedAnalytics #SparkAISummit Process Process step Parameter Lot Wafer Defective chips FPP Fall_bin P01 8152767 23 -02,04|- 01,22|+00,25|+08, 33|+04,05 waferDataSetRDD.mapToPair(...).groupBy().mapToPair(...); Generating a key value pair Calling hierarchical clustering
  43. 43. BISTel’s first approach for MA 43#UnifiedAnalytics #SparkAISummit • Using the batch mode for clustering massive wafers.
  44. 44. Demo : Running the MA algorithm 44#UnifiedAnalytics #SparkAISummit • Dataset consists of 26 parameters containing 120,000 wafers
  45. 45. Problems in batch for clustering 45#UnifiedAnalytics #SparkAISummit • In a manufacturing industry, some issues exist # of wafers Time Detecting a pattern DataSet1 15 2017-02-01:09:00 ~ 09:30 Yes DataSet2 7,000 2017-02-01~2017-02-08 No
  46. 46. Spark summit: SHCA algorithm 46#UnifiedAnalytics #SparkAISummit • In Spark Summit 2017, chen jin presented a scalable hierarchical clustering algorithm using Spark.
  47. 47. A SHCA algorithm using Spark 47#UnifiedAnalytics #SparkAISummit Jin, Chen, et al. "A scalable hierarchical clustering algorithm using spark." 2015 IEEE First International Conference on Big Data Computing Service and Applications. IEEE, 2015.
  48. 48. Applying SHCA to wafer datasets 48#UnifiedAnalytics #SparkAISummit Wafer map ID Coordinates of defective chips A (13,22), (13,23), (13,24), (13,25)… B (5,15), (6,12), (6,17), (8,25)… C (9,29), (16,33), (19,39), (22,25)… D (19,9), (20,2), (23,21), (25,4)… E (5,5), (5,8), (5,15), (5,25)… • Designed the key-value pairs • Minimum spanning tree (MST) – Vertex : Wafer – Edge : distance between wafers • distance w1, w2
  49. 49. Comparison between two versions 49#UnifiedAnalytics #SparkAISummit
  50. 50. Comparison between two versions - (2) 50#UnifiedAnalytics #SparkAISummit
  51. 51. Spark stage results of MA 51#UnifiedAnalytics #SparkAISummit • Approximately 100,000 wafers are analyzed for clustering
  52. 52. Comparison of the results 52#UnifiedAnalytics #SparkAISummit 0 500 1000 1500 2000 2500 5,000 50,000 100k 160k 320k Batch New MA
  53. 53. Summary 53#UnifiedAnalytics #SparkAISummit • MA using SHCA is accurate than the batch MA. • However, the running time of the batch MA is faster than that of the new MA. • In manufacturing industry, we suggest them to use both of two MAs.
  54. 54. Conclusions 54#UnifiedAnalytics #SparkAISummit • A first use case of Apache Spark in Semiconductor industry – Terabytes of trace data is processed – Achieved hierarchical clustering on distributed machines for semiconductor wafers
  55. 55. Acknowledgements 55#UnifiedAnalytics #SparkAISummit • BISTel Korea (BK) – Andrew An • BISTel America (BA) – James Na – WeiDong Wang – Rachel Choi – Taeseok Choi – Mingyu Lu * This work was supported by the World Class 300 Project (R&D) (S2641209, "Development of next generation intelligent Smart manufacturing solution based on AI & Big data to improve manufacturing yield and productivity") of the MOTIE, MSS(Korea).
  56. 56. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT

×