SlideShare a Scribd company logo
1 of 6
Outlier Detection in Data Mining: An
Essential Component of Semiconductor
Manufacturing
https://yieldwerx.com/
Outlier detection is a critical research field within data mining due to its vast range of applications including fraud detection,
cybersecurity, health diagnostics, and significantly for the semiconductor manufacturing industry. It refers to identifying data
points that significantly deviate from expected patterns, providing crucial insights into different aspects of data. However, the
ambiguity between outliers and normal behavior, evolving definitions of 'normal', application-specific techniques, and noisy data
mimicking outliers, often complicate the outlier detection process. This review article offers an in-depth analysis of the most
advanced outlier detection methods, presenting a thorough understanding of future research prospects.
Defining Outliers
The term outlier refers to a data point that significantly deviates from the expected behavior or is substantially dissimilar from
others within a dataset. Various causes contribute to outliers, including mechanical faults, changes in system behavior, human
errors, and environmental alterations. The identification and handling of outliers remain a complex, ongoing process in machine
learning and data mining. This procedure often goes by numerous terms such as outlier mining, novelty detection, outlier
modeling, anomaly detection, and more.
Techniques for Outlier Detection
The approaches to identifying outliers are many and varied, each leveraging different principles for the purpose. Highlighted
below are the key methods of outlier detection:
Statistical-Based Methods
This technique operates based on the deviation of a data point from a statistical model. It assumes that regular data points occur
in high-probability regions of a stochastic model, while outliers are the residents of low-probability areas.
Distance-Based Methods
Distance-based methods focus on the relative distance of a data point from other points. An outlier, in this context, is a data
point that lies an exceptionally far-off distance from others.
Density-Based Methods
This approach classifies sparse regions as outliers compared to denser parts. The central idea is that a data point located in a
low-density region is likely to be an outlier.
Clustering-Based Methods
Clustering-based techniques classify data points as outliers if they do not belong to any cluster or if they are far from their
nearest cluster centroid.
Graph-Based Methods
By constructing a graph that represents the relationships among data points, graph-based methods identify outliers as nodes with
characteristics substantially different from others.
Ensemble-Based Methods
These methods often combine multiple outlier detection techniques to produce a more robust and accurate detection process.
Learning-Based Methods
Often using supervised or semi-supervised machine learning models, these techniques learn the normal behavior patterns from
labeled data and classify the deviating instances as outliers.
Handling Outliers
Handling outliers remains a contentious topic. In some cases, outliers are viewed as erroneous data and discarded, but in other
instances, they are treated as integral parts of the dataset. Eliminating outliers from accurate data may lead to the loss of critical
information. Several techniques, such as visual examination, univariate and multivariate methods, and minimizing outliers during
training, have been proposed for outlier handling. Overall, the approach to handling outliers largely depends on the context and
often requires analytical reasoning, intuition, and deliberate decision-making.
Applications of Outlier Detection
The applications of outlier detection span across a plethora of domains such as data and process logs, fraud and intrusion detection,
security and surveillance, healthcare and medical diagnostics, transactional data sources, sensor networks and databases, data
quality and cleaning, time-series monitoring and data streams, and Internet of Things (IoT). Significantly, in the semiconductor
manufacturing industry, outlier detection can play a vital role in detecting anomalies in manufacturing processes, hence leading to
improved quality control, fault detection, and lot control in manufacturing.
Emerging Techniques: Deep Learning and Ensemble Approaches
Recent years have seen increased interest in leveraging deep learning and ensemble techniques for outlier detection. Deep
learning-based approaches, primarily autoencoders and deep neural networks (DNNs) have demonstrated promising results in
detecting complex and subtle outliers, especially in high-dimensional data. For example, Autoencoder, a popular deep learning
architecture, is trained to reconstruct its input data. The reconstruction error is then used to determine the anomaly score. A high
error indicates that the data point is hard to model, thus an outlier. Ensemble techniques combine multiple outlier detection models
to increase robustness and accuracy. They often use various base detection algorithms or multiple configurations of a single base
algorithm. The final decision is usually based on a majority vote, average, or another combination rule of the base detectors'
results. Both these techniques have promising applications in the semiconductor industry. They can detect minute faults or
anomalies in the manufacturing processes that may be overlooked by traditional methods, potentially saving significant resources
and increasing overall efficiency.
The Challenge of Scalability and the Role of Distributed Detection Techniques
As data size increases, the number of outliers and the computational cost for detection also increase, making the process slow and
costly. This is especially relevant in the semiconductor yield in manufacturing industry where terabytes of data are generated
daily. Therefore, scalable outlier detection techniques become necessary for large datasets.
To address this, distributed outlier detection techniques have been proposed. They partition the original data into several subsets and
distribute them across different nodes in a distributed system to process in parallel. After local outlier detection is performed on
each node, the results are aggregated to produce the outcome. These techniques are effective in managing large datasets, reducing
computational costs, and speeding up the detection process.
Outlier Detection in Semiconductor Manufacturing Industry: Fault Detection and
Quality Control
Outlier detection is especially important in the semiconductor manufacturing industry, where precision and accuracy are critical.
The manufacturing processes generate enormous amounts of data from various sources, such as machine logs, sensors, and quality
control tests.
Detecting outliers in this data can help identify potential faults in the manufacturing process early, thus preventing the production of
faulty chips, reducing waste, and saving costs. For instance, a sudden change in sensor readings during a particular manufacturing
stage could be an outlier, indicating a potential issue in that stage.
Moreover, outlier detection can play a significant role in quality control. By identifying anomalies in test data, outlier detection can
help pinpoint chips that may not perform as expected. This can enhance the overall quality of the products, leading to better
reliability and customer satisfaction.
To summarize, outlier detection plays a pivotal role in enhancing the efficiency, quality, and cost-effectiveness of semiconductor
manufacturing, further highlighting the need for advanced and scalable outlier detection techniques in the industry.
Conclusions
While each outlier detection technique has its unique strengths and weaknesses, the field continues to evolve, warranting
continuous research and advancement. This evolution includes a comprehensive understanding of each method's performance, the
issues they address, and their comparative analyses. This understanding will provide invaluable insights for future work in the field
of outlier detection.
References:
1. Aggarwal, C. C., & Yu, P. S. (2001). Outlier detection for high dimensional data. In Proceedings of the 2001 ACM SIGMOD
international conference on Management of data.
2. Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3), 1-58.
3. Hodge, V., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial intelligence review, 22(2), 85-126.
4. Zimek, A., Schubert, E., & Kriegel, H. P. (2012). A survey on unsupervised outlier detection in high-dimensional numerical
data. Statistical Analysis and Data Mining: The ASA Data Science Journal, 5(5), 363-387.
5. Pang, G., Cao, L., & Chen, L. (2020). Outlier detection in complex categorical data by modeling the feature value couplings.
In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
6. Ruff, L., Vandermeulen, R., Goernitz, N., Deecke, L., Siddiqui, S. A., Binder, A., ... & Kloft, M. (2018). Deep one-class
classification. In Proceedings of the 35th International Conference on Machine Learning.
7. Chalapathy, R., & Chawla, S. (2019). Deep Learning for Anomaly Detection: A Survey. arXiv preprint arXiv:1901.03407.
8. Lazarevic, A., & Kumar, V. (2005). Feature bagging for outlier detection. In Proceedings of the eleventh ACM SIGKDD
international conference on Knowledge discovery in data mining.
9. Zhang, J., Yang, Y., Appiah-Kubi, P., Zhao, W., & Xiao, J. (2017). A survey on the latest clustering-based outlier detection
methods using real datasets. Journal of Software, 12(3), 179-196.
10. Mayhew, S., & Prakash, P. (2019). Outlier detection in semiconductor manufacturing. IEEE Access, 7, 43431-43446.

More Related Content

Similar to Outlier Detection in Data Mining An Essential Component of Semiconductor Manufacturing.pptx

Correlation of artificial neural network classification and nfrs attribute fi...
Correlation of artificial neural network classification and nfrs attribute fi...Correlation of artificial neural network classification and nfrs attribute fi...
Correlation of artificial neural network classification and nfrs attribute fi...
eSAT Journals
 
A Framework for Periodic Outlier Pattern Detection in Time-Series Sequences
A Framework for Periodic Outlier Pattern Detection in Time-Series SequencesA Framework for Periodic Outlier Pattern Detection in Time-Series Sequences
A Framework for Periodic Outlier Pattern Detection in Time-Series Sequences
KaashivInfoTech Company
 
Data Mining for Big Data-Murat Yazıcı
Data Mining for Big Data-Murat YazıcıData Mining for Big Data-Murat Yazıcı
Data Mining for Big Data-Murat Yazıcı
Murat YAZICI, M.Sc.
 
Comparison of fuzzy neural clustering based outlier detection techniques
Comparison of fuzzy   neural clustering based outlier detection techniquesComparison of fuzzy   neural clustering based outlier detection techniques
Comparison of fuzzy neural clustering based outlier detection techniques
IAEME Publication
 

Similar to Outlier Detection in Data Mining An Essential Component of Semiconductor Manufacturing.pptx (20)

NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
 
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISSEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
 
Correlation of artificial neural network classification and nfrs attribute fi...
Correlation of artificial neural network classification and nfrs attribute fi...Correlation of artificial neural network classification and nfrs attribute fi...
Correlation of artificial neural network classification and nfrs attribute fi...
 
Fault Detection in Mobile Communication Networks Using Data Mining Techniques...
Fault Detection in Mobile Communication Networks Using Data Mining Techniques...Fault Detection in Mobile Communication Networks Using Data Mining Techniques...
Fault Detection in Mobile Communication Networks Using Data Mining Techniques...
 
Fault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clusteringFault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clustering
 
Outlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataOutlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional Data
 
4113ijaia09
4113ijaia094113ijaia09
4113ijaia09
 
4113ijaia09
4113ijaia094113ijaia09
4113ijaia09
 
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
 
A Framework for Periodic Outlier Pattern Detection in Time-Series Sequences
A Framework for Periodic Outlier Pattern Detection in Time-Series SequencesA Framework for Periodic Outlier Pattern Detection in Time-Series Sequences
A Framework for Periodic Outlier Pattern Detection in Time-Series Sequences
 
Analysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTAnalysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOT
 
An Efficient Approach for Outlier Detection in Wireless Sensor Network
An Efficient Approach for Outlier Detection in Wireless Sensor NetworkAn Efficient Approach for Outlier Detection in Wireless Sensor Network
An Efficient Approach for Outlier Detection in Wireless Sensor Network
 
Data Mining for Big Data-Murat Yazıcı
Data Mining for Big Data-Murat YazıcıData Mining for Big Data-Murat Yazıcı
Data Mining for Big Data-Murat Yazıcı
 
CREDIT_CARD.ppt
CREDIT_CARD.pptCREDIT_CARD.ppt
CREDIT_CARD.ppt
 
IRJET- Pattern Recognition Process, Methods and Applications in Artificial In...
IRJET- Pattern Recognition Process, Methods and Applications in Artificial In...IRJET- Pattern Recognition Process, Methods and Applications in Artificial In...
IRJET- Pattern Recognition Process, Methods and Applications in Artificial In...
 
The Indispensable Role of Outlier Detection for Ensuring Semiconductor Qualit...
The Indispensable Role of Outlier Detection for Ensuring Semiconductor Qualit...The Indispensable Role of Outlier Detection for Ensuring Semiconductor Qualit...
The Indispensable Role of Outlier Detection for Ensuring Semiconductor Qualit...
 
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET-	 Fault Detection and Prediction of Failure using Vibration AnalysisIRJET-	 Fault Detection and Prediction of Failure using Vibration Analysis
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
 
A Survey on Cluster Based Outlier Detection Techniques in Data Stream
A Survey on Cluster Based Outlier Detection Techniques in Data StreamA Survey on Cluster Based Outlier Detection Techniques in Data Stream
A Survey on Cluster Based Outlier Detection Techniques in Data Stream
 
Comparison of fuzzy neural clustering based outlier detection techniques
Comparison of fuzzy   neural clustering based outlier detection techniquesComparison of fuzzy   neural clustering based outlier detection techniques
Comparison of fuzzy neural clustering based outlier detection techniques
 

More from yieldWerx Semiconductor

More from yieldWerx Semiconductor (20)

Enhancing Quality Control with Statistical Process Control (SPC) in the Semic...
Enhancing Quality Control with Statistical Process Control (SPC) in the Semic...Enhancing Quality Control with Statistical Process Control (SPC) in the Semic...
Enhancing Quality Control with Statistical Process Control (SPC) in the Semic...
 
Harnessing the Power of Yield Management and Statistical Process Control in S...
Harnessing the Power of Yield Management and Statistical Process Control in S...Harnessing the Power of Yield Management and Statistical Process Control in S...
Harnessing the Power of Yield Management and Statistical Process Control in S...
 
Optimizing Semiconductor Yield with Robust WAT and PCM Processes.pptx
Optimizing Semiconductor Yield with Robust WAT and PCM Processes.pptxOptimizing Semiconductor Yield with Robust WAT and PCM Processes.pptx
Optimizing Semiconductor Yield with Robust WAT and PCM Processes.pptx
 
The Significance of Enhanced Yield in Semiconductor Manufacturing.pptx
The Significance of Enhanced Yield in Semiconductor Manufacturing.pptxThe Significance of Enhanced Yield in Semiconductor Manufacturing.pptx
The Significance of Enhanced Yield in Semiconductor Manufacturing.pptx
 
Intricate Deep Dive into the Enhancement of Yield Management Strategies in Se...
Intricate Deep Dive into the Enhancement of Yield Management Strategies in Se...Intricate Deep Dive into the Enhancement of Yield Management Strategies in Se...
Intricate Deep Dive into the Enhancement of Yield Management Strategies in Se...
 
Unraveling the Secrets to Optimizing Yield in Semiconductor Manufacturing.pptx
Unraveling the Secrets to Optimizing Yield in Semiconductor Manufacturing.pptxUnraveling the Secrets to Optimizing Yield in Semiconductor Manufacturing.pptx
Unraveling the Secrets to Optimizing Yield in Semiconductor Manufacturing.pptx
 
Enhancing Semiconductor Manufacturing through Advanced Wafer Mapping.pptx
Enhancing Semiconductor Manufacturing through Advanced Wafer Mapping.pptxEnhancing Semiconductor Manufacturing through Advanced Wafer Mapping.pptx
Enhancing Semiconductor Manufacturing through Advanced Wafer Mapping.pptx
 
Amplifying the Power of Efficient Semiconductor Production with Next-Gen Wafe...
Amplifying the Power of Efficient Semiconductor Production with Next-Gen Wafe...Amplifying the Power of Efficient Semiconductor Production with Next-Gen Wafe...
Amplifying the Power of Efficient Semiconductor Production with Next-Gen Wafe...
 
The Evolving Landscape of Semiconductor Manufacturing to Mitigate Yield Losse...
The Evolving Landscape of Semiconductor Manufacturing to Mitigate Yield Losse...The Evolving Landscape of Semiconductor Manufacturing to Mitigate Yield Losse...
The Evolving Landscape of Semiconductor Manufacturing to Mitigate Yield Losse...
 
The Significance of Enhanced Yield in Semiconductor Manufacturing.pptx
The Significance of Enhanced Yield in Semiconductor Manufacturing.pptxThe Significance of Enhanced Yield in Semiconductor Manufacturing.pptx
The Significance of Enhanced Yield in Semiconductor Manufacturing.pptx
 
Addressing the Challenge of Wafer Map Classification in Semiconductor Manufac...
Addressing the Challenge of Wafer Map Classification in Semiconductor Manufac...Addressing the Challenge of Wafer Map Classification in Semiconductor Manufac...
Addressing the Challenge of Wafer Map Classification in Semiconductor Manufac...
 
Innovating Quality Control in the Semiconductor Manufacturing Industry.pptx
Innovating Quality Control in the Semiconductor Manufacturing Industry.pptxInnovating Quality Control in the Semiconductor Manufacturing Industry.pptx
Innovating Quality Control in the Semiconductor Manufacturing Industry.pptx
 
A Holistic Approach to Yield Improvement in the Semiconductor Manufacturing I...
A Holistic Approach to Yield Improvement in the Semiconductor Manufacturing I...A Holistic Approach to Yield Improvement in the Semiconductor Manufacturing I...
A Holistic Approach to Yield Improvement in the Semiconductor Manufacturing I...
 
Process Control Monitoring (PCM) and Wafer Acceptance Test (WAT) in the Semic...
Process Control Monitoring (PCM) and Wafer Acceptance Test (WAT) in the Semic...Process Control Monitoring (PCM) and Wafer Acceptance Test (WAT) in the Semic...
Process Control Monitoring (PCM) and Wafer Acceptance Test (WAT) in the Semic...
 
Improving Yield and Quality in Semiconductor Manufacturing with Indispensable...
Improving Yield and Quality in Semiconductor Manufacturing with Indispensable...Improving Yield and Quality in Semiconductor Manufacturing with Indispensable...
Improving Yield and Quality in Semiconductor Manufacturing with Indispensable...
 
Essentials of Gauge R&R in Ensuring Quality in Semiconductor Manufacturing.pptx
Essentials of Gauge R&R in Ensuring Quality in Semiconductor Manufacturing.pptxEssentials of Gauge R&R in Ensuring Quality in Semiconductor Manufacturing.pptx
Essentials of Gauge R&R in Ensuring Quality in Semiconductor Manufacturing.pptx
 
Maximizing Production Efficiency with Big Data Analytics in semiconductor Man...
Maximizing Production Efficiency with Big Data Analytics in semiconductor Man...Maximizing Production Efficiency with Big Data Analytics in semiconductor Man...
Maximizing Production Efficiency with Big Data Analytics in semiconductor Man...
 
Analytics Solutions for the Semiconductor Manufacturing Industry.pptx
Analytics Solutions for the Semiconductor Manufacturing Industry.pptxAnalytics Solutions for the Semiconductor Manufacturing Industry.pptx
Analytics Solutions for the Semiconductor Manufacturing Industry.pptx
 
Understanding the Dynamics of Semiconductor Manufacturing Yield Analysis and ...
Understanding the Dynamics of Semiconductor Manufacturing Yield Analysis and ...Understanding the Dynamics of Semiconductor Manufacturing Yield Analysis and ...
Understanding the Dynamics of Semiconductor Manufacturing Yield Analysis and ...
 
Enhancing Yield in IC Design and Elevating YMS with AI and Machine Learning.pptx
Enhancing Yield in IC Design and Elevating YMS with AI and Machine Learning.pptxEnhancing Yield in IC Design and Elevating YMS with AI and Machine Learning.pptx
Enhancing Yield in IC Design and Elevating YMS with AI and Machine Learning.pptx
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 

Outlier Detection in Data Mining An Essential Component of Semiconductor Manufacturing.pptx

  • 1. Outlier Detection in Data Mining: An Essential Component of Semiconductor Manufacturing https://yieldwerx.com/
  • 2. Outlier detection is a critical research field within data mining due to its vast range of applications including fraud detection, cybersecurity, health diagnostics, and significantly for the semiconductor manufacturing industry. It refers to identifying data points that significantly deviate from expected patterns, providing crucial insights into different aspects of data. However, the ambiguity between outliers and normal behavior, evolving definitions of 'normal', application-specific techniques, and noisy data mimicking outliers, often complicate the outlier detection process. This review article offers an in-depth analysis of the most advanced outlier detection methods, presenting a thorough understanding of future research prospects. Defining Outliers The term outlier refers to a data point that significantly deviates from the expected behavior or is substantially dissimilar from others within a dataset. Various causes contribute to outliers, including mechanical faults, changes in system behavior, human errors, and environmental alterations. The identification and handling of outliers remain a complex, ongoing process in machine learning and data mining. This procedure often goes by numerous terms such as outlier mining, novelty detection, outlier modeling, anomaly detection, and more. Techniques for Outlier Detection The approaches to identifying outliers are many and varied, each leveraging different principles for the purpose. Highlighted below are the key methods of outlier detection: Statistical-Based Methods This technique operates based on the deviation of a data point from a statistical model. It assumes that regular data points occur in high-probability regions of a stochastic model, while outliers are the residents of low-probability areas. Distance-Based Methods Distance-based methods focus on the relative distance of a data point from other points. An outlier, in this context, is a data point that lies an exceptionally far-off distance from others.
  • 3. Density-Based Methods This approach classifies sparse regions as outliers compared to denser parts. The central idea is that a data point located in a low-density region is likely to be an outlier. Clustering-Based Methods Clustering-based techniques classify data points as outliers if they do not belong to any cluster or if they are far from their nearest cluster centroid. Graph-Based Methods By constructing a graph that represents the relationships among data points, graph-based methods identify outliers as nodes with characteristics substantially different from others. Ensemble-Based Methods These methods often combine multiple outlier detection techniques to produce a more robust and accurate detection process. Learning-Based Methods Often using supervised or semi-supervised machine learning models, these techniques learn the normal behavior patterns from labeled data and classify the deviating instances as outliers. Handling Outliers Handling outliers remains a contentious topic. In some cases, outliers are viewed as erroneous data and discarded, but in other instances, they are treated as integral parts of the dataset. Eliminating outliers from accurate data may lead to the loss of critical information. Several techniques, such as visual examination, univariate and multivariate methods, and minimizing outliers during training, have been proposed for outlier handling. Overall, the approach to handling outliers largely depends on the context and often requires analytical reasoning, intuition, and deliberate decision-making.
  • 4. Applications of Outlier Detection The applications of outlier detection span across a plethora of domains such as data and process logs, fraud and intrusion detection, security and surveillance, healthcare and medical diagnostics, transactional data sources, sensor networks and databases, data quality and cleaning, time-series monitoring and data streams, and Internet of Things (IoT). Significantly, in the semiconductor manufacturing industry, outlier detection can play a vital role in detecting anomalies in manufacturing processes, hence leading to improved quality control, fault detection, and lot control in manufacturing. Emerging Techniques: Deep Learning and Ensemble Approaches Recent years have seen increased interest in leveraging deep learning and ensemble techniques for outlier detection. Deep learning-based approaches, primarily autoencoders and deep neural networks (DNNs) have demonstrated promising results in detecting complex and subtle outliers, especially in high-dimensional data. For example, Autoencoder, a popular deep learning architecture, is trained to reconstruct its input data. The reconstruction error is then used to determine the anomaly score. A high error indicates that the data point is hard to model, thus an outlier. Ensemble techniques combine multiple outlier detection models to increase robustness and accuracy. They often use various base detection algorithms or multiple configurations of a single base algorithm. The final decision is usually based on a majority vote, average, or another combination rule of the base detectors' results. Both these techniques have promising applications in the semiconductor industry. They can detect minute faults or anomalies in the manufacturing processes that may be overlooked by traditional methods, potentially saving significant resources and increasing overall efficiency.
  • 5. The Challenge of Scalability and the Role of Distributed Detection Techniques As data size increases, the number of outliers and the computational cost for detection also increase, making the process slow and costly. This is especially relevant in the semiconductor yield in manufacturing industry where terabytes of data are generated daily. Therefore, scalable outlier detection techniques become necessary for large datasets. To address this, distributed outlier detection techniques have been proposed. They partition the original data into several subsets and distribute them across different nodes in a distributed system to process in parallel. After local outlier detection is performed on each node, the results are aggregated to produce the outcome. These techniques are effective in managing large datasets, reducing computational costs, and speeding up the detection process. Outlier Detection in Semiconductor Manufacturing Industry: Fault Detection and Quality Control Outlier detection is especially important in the semiconductor manufacturing industry, where precision and accuracy are critical. The manufacturing processes generate enormous amounts of data from various sources, such as machine logs, sensors, and quality control tests. Detecting outliers in this data can help identify potential faults in the manufacturing process early, thus preventing the production of faulty chips, reducing waste, and saving costs. For instance, a sudden change in sensor readings during a particular manufacturing stage could be an outlier, indicating a potential issue in that stage. Moreover, outlier detection can play a significant role in quality control. By identifying anomalies in test data, outlier detection can help pinpoint chips that may not perform as expected. This can enhance the overall quality of the products, leading to better reliability and customer satisfaction. To summarize, outlier detection plays a pivotal role in enhancing the efficiency, quality, and cost-effectiveness of semiconductor manufacturing, further highlighting the need for advanced and scalable outlier detection techniques in the industry.
  • 6. Conclusions While each outlier detection technique has its unique strengths and weaknesses, the field continues to evolve, warranting continuous research and advancement. This evolution includes a comprehensive understanding of each method's performance, the issues they address, and their comparative analyses. This understanding will provide invaluable insights for future work in the field of outlier detection. References: 1. Aggarwal, C. C., & Yu, P. S. (2001). Outlier detection for high dimensional data. In Proceedings of the 2001 ACM SIGMOD international conference on Management of data. 2. Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3), 1-58. 3. Hodge, V., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial intelligence review, 22(2), 85-126. 4. Zimek, A., Schubert, E., & Kriegel, H. P. (2012). A survey on unsupervised outlier detection in high-dimensional numerical data. Statistical Analysis and Data Mining: The ASA Data Science Journal, 5(5), 363-387. 5. Pang, G., Cao, L., & Chen, L. (2020). Outlier detection in complex categorical data by modeling the feature value couplings. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 6. Ruff, L., Vandermeulen, R., Goernitz, N., Deecke, L., Siddiqui, S. A., Binder, A., ... & Kloft, M. (2018). Deep one-class classification. In Proceedings of the 35th International Conference on Machine Learning. 7. Chalapathy, R., & Chawla, S. (2019). Deep Learning for Anomaly Detection: A Survey. arXiv preprint arXiv:1901.03407. 8. Lazarevic, A., & Kumar, V. (2005). Feature bagging for outlier detection. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. 9. Zhang, J., Yang, Y., Appiah-Kubi, P., Zhao, W., & Xiao, J. (2017). A survey on the latest clustering-based outlier detection methods using real datasets. Journal of Software, 12(3), 179-196. 10. Mayhew, S., & Prakash, P. (2019). Outlier detection in semiconductor manufacturing. IEEE Access, 7, 43431-43446.