SlideShare a Scribd company logo
1 of 4
Download to read offline
Debugging of Hadoop Production
Errors Using Open Source Flow
Analyzer – Semiconductor
Industry Case Study
Reach the community at users@collaborate.jumbune.org
Download Jumbune from http://www.jumbune.org
An Open Source initiative LGPLv3 licensed
Table of Contents
I. Overview ................................................................ 2
II. Business Challenge and Future Proofing ............. 2
III. Use Case ................................................................ 2
IV. MapReduce Debugging using Jumbune’s
Debugger ............................................................... 3
V. Impact: Zero issues in Production........................ 3
DEBUGGING OF HADOOP PRODUCTION ERRORS USING OPEN SOURCE FLOW ANALYZER – SEMICONDUCTOR INDUSTRY
CASE STUDY 2
Overview
A renowned manufacturer of microprocessors and GPUs, depends on data generated during manufacturing to improve
upon its processors, manufacturing engineering processes, detect flaws during fabrication and also determine when a
specific step in one of its manufacturing processes starts to deviate from normal tolerances. This data can include logs
from the fabrication which contains the temperature, stage and wafer details and also logs from the multi-tier tests that
are run during each phase of the manufacturing process.
Business Challenge and Future Proofing
The company traditionally used RDBMS to store all the current and historical data. As the data grew, as the costs
associated with running analytics and maintenance of the database started multiplying the company decided to migrate to
Apache Hadoop system to create an enterprise data lake on HDFS (Hadoop Distributed File System) for faster analytics.
All the data from the Wafer processing, Die preparation to IC packaging and IC testing data would be pushed on to the
HDFS enterprise data lake. One of the initial challenge faced by the data engineering team was partitioning data into
their respective departments and/ sections for faster querying and accessibility. The team decided to go ahead with
MapReduce for implementing the partitioning system.
Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-
terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant
manner. A MapReduce job usually splits the input data-set into independent chunks which are processed by the map
tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce
tasks. Typically both the input and the output of the job are stored in a file-system. The framework takes care of
scheduling tasks, monitoring them and re-executes the failed tasks.
Use Case
The various environments in the organization have heterogeneous Hadoop setups, Development and QA use Hadoop
2.4.1 Yarn, whereas Production environment has a commercial distribution of Hadoop. A system was put in place to
partition data on the HDFS according to the source, type and ingestion period and was running smoothly till the QA
team found some inconsistencies in the partitioned data. During the semiconductor device fabrication once the front-
end process has been completed, the semiconductor devices are subjected to a variety of electrical tests to determine if
they function properly. The fab tests the chips on the wafer with an electronic tester that presses tiny probes against the
chip. The machine marks each bad chip with a drop of dye and is logged into a central repository and chips are sorted
into virtual bins according to predetermined test limits. The resulting data can be graphed, or logged, on a wafer map to
trace manufacturing defects and mark bad chips. This map can also be used during wafer assembly and packaging. Chips
are also tested again after packaging and logged into the repository again. All the respective log and result data were
dumped onto the HDFS data lake for further analysis.
The data from the test phases were categorized according to their source, result and environment for pattern recognition.
The QA team found that there were a lot of inconsistencies in the test data from the wafer phase and the packaging
phase. Large number of the wafers that had passed the failure threshold during fabrication was found to be unresponsive
during the later phases, this lead to a lot of effort being spent on finding out the source of the issue. Multiple failure
points such as the test probe, test bed, the MapReduce system that sorts the data and the final quality checks were
DEBUGGING OF HADOOP PRODUCTION ERRORS USING OPEN SOURCE FLOW ANALYZER – SEMICONDUCTOR INDUSTRY
CASE STUDY 3
analyzed to zero in on the source of the discrepancy. After spending lot of man hours testing the test equipment and the
different phases of manufacturing it was concluded that the discrepancies were instigated during the MapReduce phase.
MapReduce Debugging using Jumbune’s Debugger
MapReduce debugging is a tiresome but unavoidable routine for any Hadoop developer. An effective debugger can
make programmers more productive by allowing them to dedicate more time in implementing use cases. Debugger
should allow the programmer to inspect the code at the site of an anomaly and trace back in the program stack to derive
more clues about the cause of the problem.
 A successful debugger should collect data about billions of key value pairs that go through each component,
phase and control block of the MapReduce application.
 Such an extensive system should be frugal, scalable and yet detailed.
o It must be frugal that so that the instrumentation does not escalate the execution time of the code.
o It must also collect information that is detailed enough to let the developer understand the
bottlenecks and faults in their program.
o The debugging system must also scale to large data sets which would accurately mimic all the use
cases that the developers have in mind.
Jumbune’s MapReduce debugger is a unique solution that gives the cluster level correlation of data with execution flow
inside the code. It can bring down hours of execution logic debugging trials by identification of root causes within
minutes. User may apply regex validations or user defined validation classes. As per the applied validation, Flow
Debugger checks the flow of input data tuples essentially <key, value> pair data for each mapper and reducer in the
submitted job.
 Jumbune provides a comprehensive table/chart view depicting the flow of input records through the job.
 The flow is displayed at job level, MapReduce level, and instance level.
 Unmatched keys/values represent the number of unexpected flow of key/value data through the job.
 Debugger drills down into the code to examine the flow of data for various counters like loops and if-
conditions, else-if, etc.
The team with the help of Jumbune’s flow debugger team created custom validation classes in java for each test and with
this they were able to correlate the number of tests that passed and to the actual result, was able to detect that during the
partitioning of data in the MapReduce phase, the pass threshold for a certain critical test was interpreted in a wrong
manner which lead to a lot of valid wafer data being diagnosed as faulty.
Impact: Zero issues in Production
Production errors and failures results in significant loss in revenue and time, enterprise soliutions have zero tolerence to
them. Jumbune’s Debugger has helped this organisation to get rid of production issues all tougether.

More Related Content

Similar to Debugging of hadoop production errors using open source flow analyzer – semiconductor industry case study

Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigDataThanusha154
 
Devfest 2023 - Service Weaver Introduction - Taipei.pdf
Devfest 2023 - Service Weaver Introduction - Taipei.pdfDevfest 2023 - Service Weaver Introduction - Taipei.pdf
Devfest 2023 - Service Weaver Introduction - Taipei.pdfKAI CHU CHUNG
 
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterEnhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterIRJET Journal
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online trainingHarika583
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET Journal
 
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISON
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISONSTATISTICAL ANALYSIS FOR PERFORMANCE COMPARISON
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISONijseajournal
 
Big Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit KharabeBig Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit KharabeROHIT KHARABE
 
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...IRJET Journal
 
Strengthening the Quality of Big Data Implementations
Strengthening the Quality of Big Data ImplementationsStrengthening the Quality of Big Data Implementations
Strengthening the Quality of Big Data ImplementationsCognizant
 
Pegasus-Poster-2016-final-v2
Pegasus-Poster-2016-final-v2Pegasus-Poster-2016-final-v2
Pegasus-Poster-2016-final-v2Samrat Jha
 
Developing a Map Reduce Application
Developing a Map Reduce ApplicationDeveloping a Map Reduce Application
Developing a Map Reduce ApplicationDr. C.V. Suresh Babu
 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)theijes
 
Performance eng prakash.sahu
Performance eng prakash.sahuPerformance eng prakash.sahu
Performance eng prakash.sahuDr. Prakash Sahu
 
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...IRJET Journal
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questionsKalyan Hadoop
 
IRJET- Big Data Processes and Analysis using Hadoop Framework
IRJET- Big Data Processes and Analysis using Hadoop FrameworkIRJET- Big Data Processes and Analysis using Hadoop Framework
IRJET- Big Data Processes and Analysis using Hadoop FrameworkIRJET Journal
 
Value stream mapping for DevOps
Value stream mapping for DevOpsValue stream mapping for DevOps
Value stream mapping for DevOpsMarc Hornbeek
 
Performance Evaluation of Open Source Data Mining Tools
Performance Evaluation of Open Source Data Mining ToolsPerformance Evaluation of Open Source Data Mining Tools
Performance Evaluation of Open Source Data Mining Toolsijsrd.com
 

Similar to Debugging of hadoop production errors using open source flow analyzer – semiconductor industry case study (20)

Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
 
Devfest 2023 - Service Weaver Introduction - Taipei.pdf
Devfest 2023 - Service Weaver Introduction - Taipei.pdfDevfest 2023 - Service Weaver Introduction - Taipei.pdf
Devfest 2023 - Service Weaver Introduction - Taipei.pdf
 
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterEnhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop Cluster
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOP
 
F1803013034
F1803013034F1803013034
F1803013034
 
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISON
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISONSTATISTICAL ANALYSIS FOR PERFORMANCE COMPARISON
STATISTICAL ANALYSIS FOR PERFORMANCE COMPARISON
 
Big Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit KharabeBig Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit Kharabe
 
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
 
Strengthening the Quality of Big Data Implementations
Strengthening the Quality of Big Data ImplementationsStrengthening the Quality of Big Data Implementations
Strengthening the Quality of Big Data Implementations
 
Pegasus-Poster-2016-final-v2
Pegasus-Poster-2016-final-v2Pegasus-Poster-2016-final-v2
Pegasus-Poster-2016-final-v2
 
Developing a Map Reduce Application
Developing a Map Reduce ApplicationDeveloping a Map Reduce Application
Developing a Map Reduce Application
 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)
 
Performance eng prakash.sahu
Performance eng prakash.sahuPerformance eng prakash.sahu
Performance eng prakash.sahu
 
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
IRJET- Big Data Processes and Analysis using Hadoop Framework
IRJET- Big Data Processes and Analysis using Hadoop FrameworkIRJET- Big Data Processes and Analysis using Hadoop Framework
IRJET- Big Data Processes and Analysis using Hadoop Framework
 
Value stream mapping for DevOps
Value stream mapping for DevOpsValue stream mapping for DevOps
Value stream mapping for DevOps
 
Performance Evaluation of Open Source Data Mining Tools
Performance Evaluation of Open Source Data Mining ToolsPerformance Evaluation of Open Source Data Mining Tools
Performance Evaluation of Open Source Data Mining Tools
 
Unit 2
Unit 2Unit 2
Unit 2
 

Recently uploaded

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 

Recently uploaded (20)

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 

Debugging of hadoop production errors using open source flow analyzer – semiconductor industry case study

  • 1. Debugging of Hadoop Production Errors Using Open Source Flow Analyzer – Semiconductor Industry Case Study Reach the community at users@collaborate.jumbune.org Download Jumbune from http://www.jumbune.org An Open Source initiative LGPLv3 licensed
  • 2. Table of Contents I. Overview ................................................................ 2 II. Business Challenge and Future Proofing ............. 2 III. Use Case ................................................................ 2 IV. MapReduce Debugging using Jumbune’s Debugger ............................................................... 3 V. Impact: Zero issues in Production........................ 3
  • 3. DEBUGGING OF HADOOP PRODUCTION ERRORS USING OPEN SOURCE FLOW ANALYZER – SEMICONDUCTOR INDUSTRY CASE STUDY 2 Overview A renowned manufacturer of microprocessors and GPUs, depends on data generated during manufacturing to improve upon its processors, manufacturing engineering processes, detect flaws during fabrication and also determine when a specific step in one of its manufacturing processes starts to deviate from normal tolerances. This data can include logs from the fabrication which contains the temperature, stage and wafer details and also logs from the multi-tier tests that are run during each phase of the manufacturing process. Business Challenge and Future Proofing The company traditionally used RDBMS to store all the current and historical data. As the data grew, as the costs associated with running analytics and maintenance of the database started multiplying the company decided to migrate to Apache Hadoop system to create an enterprise data lake on HDFS (Hadoop Distributed File System) for faster analytics. All the data from the Wafer processing, Die preparation to IC packaging and IC testing data would be pushed on to the HDFS enterprise data lake. One of the initial challenge faced by the data engineering team was partitioning data into their respective departments and/ sections for faster querying and accessibility. The team decided to go ahead with MapReduce for implementing the partitioning system. Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi- terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner. A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks. Use Case The various environments in the organization have heterogeneous Hadoop setups, Development and QA use Hadoop 2.4.1 Yarn, whereas Production environment has a commercial distribution of Hadoop. A system was put in place to partition data on the HDFS according to the source, type and ingestion period and was running smoothly till the QA team found some inconsistencies in the partitioned data. During the semiconductor device fabrication once the front- end process has been completed, the semiconductor devices are subjected to a variety of electrical tests to determine if they function properly. The fab tests the chips on the wafer with an electronic tester that presses tiny probes against the chip. The machine marks each bad chip with a drop of dye and is logged into a central repository and chips are sorted into virtual bins according to predetermined test limits. The resulting data can be graphed, or logged, on a wafer map to trace manufacturing defects and mark bad chips. This map can also be used during wafer assembly and packaging. Chips are also tested again after packaging and logged into the repository again. All the respective log and result data were dumped onto the HDFS data lake for further analysis. The data from the test phases were categorized according to their source, result and environment for pattern recognition. The QA team found that there were a lot of inconsistencies in the test data from the wafer phase and the packaging phase. Large number of the wafers that had passed the failure threshold during fabrication was found to be unresponsive during the later phases, this lead to a lot of effort being spent on finding out the source of the issue. Multiple failure points such as the test probe, test bed, the MapReduce system that sorts the data and the final quality checks were
  • 4. DEBUGGING OF HADOOP PRODUCTION ERRORS USING OPEN SOURCE FLOW ANALYZER – SEMICONDUCTOR INDUSTRY CASE STUDY 3 analyzed to zero in on the source of the discrepancy. After spending lot of man hours testing the test equipment and the different phases of manufacturing it was concluded that the discrepancies were instigated during the MapReduce phase. MapReduce Debugging using Jumbune’s Debugger MapReduce debugging is a tiresome but unavoidable routine for any Hadoop developer. An effective debugger can make programmers more productive by allowing them to dedicate more time in implementing use cases. Debugger should allow the programmer to inspect the code at the site of an anomaly and trace back in the program stack to derive more clues about the cause of the problem.  A successful debugger should collect data about billions of key value pairs that go through each component, phase and control block of the MapReduce application.  Such an extensive system should be frugal, scalable and yet detailed. o It must be frugal that so that the instrumentation does not escalate the execution time of the code. o It must also collect information that is detailed enough to let the developer understand the bottlenecks and faults in their program. o The debugging system must also scale to large data sets which would accurately mimic all the use cases that the developers have in mind. Jumbune’s MapReduce debugger is a unique solution that gives the cluster level correlation of data with execution flow inside the code. It can bring down hours of execution logic debugging trials by identification of root causes within minutes. User may apply regex validations or user defined validation classes. As per the applied validation, Flow Debugger checks the flow of input data tuples essentially <key, value> pair data for each mapper and reducer in the submitted job.  Jumbune provides a comprehensive table/chart view depicting the flow of input records through the job.  The flow is displayed at job level, MapReduce level, and instance level.  Unmatched keys/values represent the number of unexpected flow of key/value data through the job.  Debugger drills down into the code to examine the flow of data for various counters like loops and if- conditions, else-if, etc. The team with the help of Jumbune’s flow debugger team created custom validation classes in java for each test and with this they were able to correlate the number of tests that passed and to the actual result, was able to detect that during the partitioning of data in the MapReduce phase, the pass threshold for a certain critical test was interpreted in a wrong manner which lead to a lot of valid wafer data being diagnosed as faulty. Impact: Zero issues in Production Production errors and failures results in significant loss in revenue and time, enterprise soliutions have zero tolerence to them. Jumbune’s Debugger has helped this organisation to get rid of production issues all tougether.