Successfully reported this slideshow.

Revealing the Power of Legacy Machine Data

0

Share

1 of 23
1 of 23

Revealing the Power of Legacy Machine Data

0

Share

Download to read offline

Description

ENGEL, which was founded in 1945, now is the leading manufacturer for injection moulding machines on the global market. Since then, and especially in the current era, the amount of data has grown immensely and has also become more and more heterogenous due to newer generations of machine controls. Taking a closer look at the conglomerations of each and every machine’s log files, one can find 13 different types of timestamps, different archive types and more peculiarities of each control generation. Apparently, this has led to certain problems in automatically processing and analysing the data.

Transcript

  1. 1. Revealing the Power of Legacy Machine-Data Oliver Lemp ENGEL Austria GmbH
  2. 2. ENGEL: Injection Moulding Machines ▪ Austria’s largest machine manufacturer ▪ Market leader for injection moulding machines ▪ Machines to manufacture plastic products
  3. 3. Customer Proximity Business Units Automotive Medical Packaging Technical Moulding Teletronics
  4. 4. Setting the Scene The ENGEL Customer Service Customer Report a problem at the machine 1st level support Field Engineer Problem cannot be solved immediately, send field engineer Repair & Maintenance • Collect error reports • Analyse & fix errors Collect feedback
  5. 5. Analysis in the Past Excel as the tool of choice
  6. 6. An Idea Popped Into Our Heads
  7. 7. The Use Case Goal: Self Service Tools Customer 1st level support Field Engineer Repair & Maintenance DIY error analysis
  8. 8. The Use Case ▪ Use data science to assist the customer support ▪ Classified error documentation (symptoms, errors, solutions) ▪ Detect error patterns automatically / rule-based ▪ Reduce maintenance/repair times ▪ Predict future errors ▪ Detect / discover serial defects ▪ Generate sustainable knowledge ▪ Fast onboarding of new employees ▪ Focus on fixing the problems efficiently ▪ Creating data-driven solutions Fault Discovery Assistance
  9. 9. Challenges of the Use Case Starting situation
  10. 10. Challenges of the Use Case ▪ Zipped collection of (serialised) logfiles ▪ Snapshot of the machine’s parameters ▪ Last X errors on the machine ▪ Fault discovery & documentation ▪ Customer support can derive wrong settings from the reports ▪ Different data formats for different control generations ▪ Legacy data (no standard in chosen data formats) ▪ Collected since approx. 1990 ▪ Ranging from simple text files to recursive archives ENGEL Error Reports
  11. 11. Challenges of the Use Case Recursive Archive Structure ▪ Logfiles (partially binary serialised) ▪ Memory Dumps ▪ Parameter Snapshots ▪ … Issues ▪ 13 different timestamp formats ▪ Different structure for each control generation ▪ Broken archives ▪ Missing files ▪ … Report Structure
  12. 12. Prototyping a Solution ▪ Hortonworks stack was promising ▪ Apache Nifi, Spark, Kafka and HDFS as our core components ▪ Starting with a small cluster ▪ 5x raspberry pies ▪ Establishing a production environment on dedicated hardware ▪ On–premise hosting Hadoop seemed to be in fashion
  13. 13. Prototyping a Solution Lambda Architecture on Hortonworks HDP Upload report Apache Nifi Data ingestion & routing Write meta attributes + filepath to Kafka Apache Kafka (Event) Stream Process metadata Process parameters (Parquet) Store raw data blob HDFS Batch & Stream Processing BI, web & mobile apps
  14. 14. New Difficulties Arise ▪ Maintaining Streaming + batch jobs ▪ Kafka and large files ▪ Reading from multiple systems (Kafka + HDFS) ▪ Hadoop and small files ▪ Legacy binary deserialisation ▪ Pascal(!) JNA Wrapper ▪ Unpredictable (parameter) data ▪ Binaries with 200.000 and up to 3 Mio. variables per error report Non-standard use case JavaDStream<String> json = kafkaStream.map(ConsumerRecord::value); json.foreachRDD(rdd -> { Dataset<Row> df = sparkSession.read().json(rdd); if (df.count() >= 1) { List<String> hdfsPaths = df .select("`hdfs.filepath`") .javaRDD() .map(row -> row.getString(0)).collect(); String hdfsPaths = String.join(",", hdfsPaths); SampleProcessor sampleProcessor = new SampleProcessor(); JavaRDD<String> binaries = javaSparkContext.binaryFiles(hdfsPaths) .map(report -> new TarArchiveInputStream(report._2.open())) .map(sampleProcessor::call); } else { Log.info("No records in this batch"); } }); High complexity and workaround for streaming large binaries
  15. 15. Partitioning Systemvariables ▪ Spark and parquet files to store systemvariables ▪ No efficient and economic database was found Flattening the tree structure Timestamp,FabNr,VarName,Unit,IntValue,DoubleValue,StringValue,BoolValue 2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.ai_Pressure,,0.174087137,, 2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.ai_Pressure_sim,,,,, 2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.ai_Pressure_stat,,,,false 2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.do_AccuChargeMainPump,,,,false 2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.do_AccuInject,,,, 2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.do_AccuOff,,,, 2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.do_AccuSafety,,,,false 2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.er_AccuPressMin,,,, 2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.evAnaDisEn,,,,
  16. 16. Partitioning Systemvariables ▪ Ideally by functionunit (1st part of the variable) ▪ Grouped by machine components ▪ Custom hash-based partitioning ▪ Not time series data ▪ Very good for point lookups – not so much for regex Optimising for point lookups SELECT * FROM variables WHERE varName = "AccuGeneral1.ai_Pressure" AND fabNr = "XXX" SELECT * FROM variables WHERE varName = "AccuGeneral1.ai_Pressure" Query Examples
  17. 17. Issues in This Architecture ▪ Upgrades and Migrations ▪ Job Monitoring / Ganglia Metrics ▪ Repartitioning Job ▪ Merging Streaming and Batch Files ▪ Unpredictable errors in batch jobs ▪ Memory Bombs / Issues ▪ Spark 2.x no binary file support => working with RDDs ▪ This architecture feels like a big workaround The Real Show Stoppers WARN TaskSetManager: Lost task 53.0 in stage 49.0 (TID 32715, XXXXXXXXXX): ExecutorLostFailure (executor 23 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 12.4 GB of 12 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
  18. 18. Working Towards an Optimal Solution? Discovering Azure & Databricks Upload report Azure Data Lake Storage Autoloader Process metadata Process parameters (Parquet) Store raw data blob BI, web & mobile apps Azure Cosmos DB
  19. 19. Working Towards an Optimal Solution? ▪ Equi-distant range partitioning ▪ Based on Lexical order of parameter names ▪ Roughly same amount of parameters per partition ▪ Allows searching for variables in the same component / root node ▪ Better data skipping ▪ OPTIMIZE instead of repartitioning jobs Parameter Partitioning A.xxx B.xxx C.xxx D.xxx E.xxx F.xxx Partition 1 Partition 2 ….
  20. 20. Working Towards an Optimal Solution? ▪ Reduced Complexity ▪ One single configurable Spark job ▪ Kafka replaced by Autoloader ▪ Unified Batch & Streaming ▪ …. ▪ Reduced memory pressure ▪ Micro Batches and not full batches ▪ Stable Jobs ▪ Monitoring ▪ JVM / Ganglia Metrics Azure & Databricks Benefits
  21. 21. The Current State Self-service tools instead of manual doing Currently ▪ Established self-service tools ▪ Send a PDF summary to the field engineers ▪ Steps that could solve the problem ▪ Things to also consider at the machine In future ▪ Automatically detect and classify errors
  22. 22. Key Takeaways ▪ Don’t underestimate the effort to process legacy- data ▪ The unknown in this data jungle is quite daunting ▪ Unforeseeable things will happen ▪ Change management in a traditional company is really demanding ▪ Running an unmanaged cluster without dedicated resources leads to pure frustration ▪ Moving to the cloud reduced the complexity in our pipelines by a lot
  23. 23. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

Description

ENGEL, which was founded in 1945, now is the leading manufacturer for injection moulding machines on the global market. Since then, and especially in the current era, the amount of data has grown immensely and has also become more and more heterogenous due to newer generations of machine controls. Taking a closer look at the conglomerations of each and every machine’s log files, one can find 13 different types of timestamps, different archive types and more peculiarities of each control generation. Apparently, this has led to certain problems in automatically processing and analysing the data.

Transcript

  1. 1. Revealing the Power of Legacy Machine-Data Oliver Lemp ENGEL Austria GmbH
  2. 2. ENGEL: Injection Moulding Machines ▪ Austria’s largest machine manufacturer ▪ Market leader for injection moulding machines ▪ Machines to manufacture plastic products
  3. 3. Customer Proximity Business Units Automotive Medical Packaging Technical Moulding Teletronics
  4. 4. Setting the Scene The ENGEL Customer Service Customer Report a problem at the machine 1st level support Field Engineer Problem cannot be solved immediately, send field engineer Repair & Maintenance • Collect error reports • Analyse & fix errors Collect feedback
  5. 5. Analysis in the Past Excel as the tool of choice
  6. 6. An Idea Popped Into Our Heads
  7. 7. The Use Case Goal: Self Service Tools Customer 1st level support Field Engineer Repair & Maintenance DIY error analysis
  8. 8. The Use Case ▪ Use data science to assist the customer support ▪ Classified error documentation (symptoms, errors, solutions) ▪ Detect error patterns automatically / rule-based ▪ Reduce maintenance/repair times ▪ Predict future errors ▪ Detect / discover serial defects ▪ Generate sustainable knowledge ▪ Fast onboarding of new employees ▪ Focus on fixing the problems efficiently ▪ Creating data-driven solutions Fault Discovery Assistance
  9. 9. Challenges of the Use Case Starting situation
  10. 10. Challenges of the Use Case ▪ Zipped collection of (serialised) logfiles ▪ Snapshot of the machine’s parameters ▪ Last X errors on the machine ▪ Fault discovery & documentation ▪ Customer support can derive wrong settings from the reports ▪ Different data formats for different control generations ▪ Legacy data (no standard in chosen data formats) ▪ Collected since approx. 1990 ▪ Ranging from simple text files to recursive archives ENGEL Error Reports
  11. 11. Challenges of the Use Case Recursive Archive Structure ▪ Logfiles (partially binary serialised) ▪ Memory Dumps ▪ Parameter Snapshots ▪ … Issues ▪ 13 different timestamp formats ▪ Different structure for each control generation ▪ Broken archives ▪ Missing files ▪ … Report Structure
  12. 12. Prototyping a Solution ▪ Hortonworks stack was promising ▪ Apache Nifi, Spark, Kafka and HDFS as our core components ▪ Starting with a small cluster ▪ 5x raspberry pies ▪ Establishing a production environment on dedicated hardware ▪ On–premise hosting Hadoop seemed to be in fashion
  13. 13. Prototyping a Solution Lambda Architecture on Hortonworks HDP Upload report Apache Nifi Data ingestion & routing Write meta attributes + filepath to Kafka Apache Kafka (Event) Stream Process metadata Process parameters (Parquet) Store raw data blob HDFS Batch & Stream Processing BI, web & mobile apps
  14. 14. New Difficulties Arise ▪ Maintaining Streaming + batch jobs ▪ Kafka and large files ▪ Reading from multiple systems (Kafka + HDFS) ▪ Hadoop and small files ▪ Legacy binary deserialisation ▪ Pascal(!) JNA Wrapper ▪ Unpredictable (parameter) data ▪ Binaries with 200.000 and up to 3 Mio. variables per error report Non-standard use case JavaDStream<String> json = kafkaStream.map(ConsumerRecord::value); json.foreachRDD(rdd -> { Dataset<Row> df = sparkSession.read().json(rdd); if (df.count() >= 1) { List<String> hdfsPaths = df .select("`hdfs.filepath`") .javaRDD() .map(row -> row.getString(0)).collect(); String hdfsPaths = String.join(",", hdfsPaths); SampleProcessor sampleProcessor = new SampleProcessor(); JavaRDD<String> binaries = javaSparkContext.binaryFiles(hdfsPaths) .map(report -> new TarArchiveInputStream(report._2.open())) .map(sampleProcessor::call); } else { Log.info("No records in this batch"); } }); High complexity and workaround for streaming large binaries
  15. 15. Partitioning Systemvariables ▪ Spark and parquet files to store systemvariables ▪ No efficient and economic database was found Flattening the tree structure Timestamp,FabNr,VarName,Unit,IntValue,DoubleValue,StringValue,BoolValue 2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.ai_Pressure,,0.174087137,, 2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.ai_Pressure_sim,,,,, 2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.ai_Pressure_stat,,,,false 2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.do_AccuChargeMainPump,,,,false 2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.do_AccuInject,,,, 2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.do_AccuOff,,,, 2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.do_AccuSafety,,,,false 2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.er_AccuPressMin,,,, 2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.evAnaDisEn,,,,
  16. 16. Partitioning Systemvariables ▪ Ideally by functionunit (1st part of the variable) ▪ Grouped by machine components ▪ Custom hash-based partitioning ▪ Not time series data ▪ Very good for point lookups – not so much for regex Optimising for point lookups SELECT * FROM variables WHERE varName = "AccuGeneral1.ai_Pressure" AND fabNr = "XXX" SELECT * FROM variables WHERE varName = "AccuGeneral1.ai_Pressure" Query Examples
  17. 17. Issues in This Architecture ▪ Upgrades and Migrations ▪ Job Monitoring / Ganglia Metrics ▪ Repartitioning Job ▪ Merging Streaming and Batch Files ▪ Unpredictable errors in batch jobs ▪ Memory Bombs / Issues ▪ Spark 2.x no binary file support => working with RDDs ▪ This architecture feels like a big workaround The Real Show Stoppers WARN TaskSetManager: Lost task 53.0 in stage 49.0 (TID 32715, XXXXXXXXXX): ExecutorLostFailure (executor 23 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 12.4 GB of 12 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
  18. 18. Working Towards an Optimal Solution? Discovering Azure & Databricks Upload report Azure Data Lake Storage Autoloader Process metadata Process parameters (Parquet) Store raw data blob BI, web & mobile apps Azure Cosmos DB
  19. 19. Working Towards an Optimal Solution? ▪ Equi-distant range partitioning ▪ Based on Lexical order of parameter names ▪ Roughly same amount of parameters per partition ▪ Allows searching for variables in the same component / root node ▪ Better data skipping ▪ OPTIMIZE instead of repartitioning jobs Parameter Partitioning A.xxx B.xxx C.xxx D.xxx E.xxx F.xxx Partition 1 Partition 2 ….
  20. 20. Working Towards an Optimal Solution? ▪ Reduced Complexity ▪ One single configurable Spark job ▪ Kafka replaced by Autoloader ▪ Unified Batch & Streaming ▪ …. ▪ Reduced memory pressure ▪ Micro Batches and not full batches ▪ Stable Jobs ▪ Monitoring ▪ JVM / Ganglia Metrics Azure & Databricks Benefits
  21. 21. The Current State Self-service tools instead of manual doing Currently ▪ Established self-service tools ▪ Send a PDF summary to the field engineers ▪ Steps that could solve the problem ▪ Things to also consider at the machine In future ▪ Automatically detect and classify errors
  22. 22. Key Takeaways ▪ Don’t underestimate the effort to process legacy- data ▪ The unknown in this data jungle is quite daunting ▪ Unforeseeable things will happen ▪ Change management in a traditional company is really demanding ▪ Running an unmanaged cluster without dedicated resources leads to pure frustration ▪ Moving to the cloud reduced the complexity in our pipelines by a lot
  23. 23. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

More Related Content

×