Successfully reported this slideshow.
Your SlideShare is downloading. ×

Predictive Maintenance at the Dutch Railways with Ivo Everts

Ad

Ivo Everts, GoDataDriven (Amsterdam, NL)
Predictive Maintenance
at the Dutch Railways
Air leakage detection in train break...

Ad

#DSSAIS16 !2
Tünde Alkemade
Ivo Everts
Jelte Hoekstra
Giovanni Lanzani
Janelle Zoutkamp
Stefan Hendriks
Wouter Hordijk
Ing...

Ad

#DSSAIS16
Published paper in June 2017
'Contextual Air Leakage Detection in Train Braking Pipes'
@ 'Advances in Artificial...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Check these out next

1 of 32 Ad
1 of 32 Ad

Predictive Maintenance at the Dutch Railways with Ivo Everts

Download to read offline

At the Dutch Railways, we are collecting 10s of billions sensor measurements coming from the train fleet and railroad every year. We use these data to conduct predictive maintenance, such as predicting failure of the train axle bearings and detecting air leakage in the train braking pipes. This is extremely useful, as these failures are notoriously difficult to detect during regular maintenance, while occurring frequently and causing severe delays, damage to material and reputy, and costs.

In this talk, we present how we use the compressor logs in order to detect the occurrence of air leakage in the train braking pipes. Compressor run- and idle- times are extracted from the logs and modelled by a logistic regressor for discriminating between the two classes in normal operational mode. Air leakage will cause the idle times to become shorter as air pressure needs to be leveled more frequently, which can be detected with the logistic model. Then, with a density-based clustering technique, a sequence of such events can be identified, while ignoring outliers due to circumstantial phenomena such as power outage. These clusters are associated to levels of severity, based on which a trend analysis can unlock the expected number of days the compressor will still function before breaking down. This method has been developed by Wan-Jui Lee of the Dutch Railways and published as “Anomaly Detection and Severity Prediction of Air Leakage in Train Braking Pipes” in the International Journal of Prognostics and Health Management in 2017. We have implemented the methods as described in the paper using Python and Spark in a production environment.

At the Dutch Railways, we are collecting 10s of billions sensor measurements coming from the train fleet and railroad every year. We use these data to conduct predictive maintenance, such as predicting failure of the train axle bearings and detecting air leakage in the train braking pipes. This is extremely useful, as these failures are notoriously difficult to detect during regular maintenance, while occurring frequently and causing severe delays, damage to material and reputy, and costs.

In this talk, we present how we use the compressor logs in order to detect the occurrence of air leakage in the train braking pipes. Compressor run- and idle- times are extracted from the logs and modelled by a logistic regressor for discriminating between the two classes in normal operational mode. Air leakage will cause the idle times to become shorter as air pressure needs to be leveled more frequently, which can be detected with the logistic model. Then, with a density-based clustering technique, a sequence of such events can be identified, while ignoring outliers due to circumstantial phenomena such as power outage. These clusters are associated to levels of severity, based on which a trend analysis can unlock the expected number of days the compressor will still function before breaking down. This method has been developed by Wan-Jui Lee of the Dutch Railways and published as “Anomaly Detection and Severity Prediction of Air Leakage in Train Braking Pipes” in the International Journal of Prognostics and Health Management in 2017. We have implemented the methods as described in the paper using Python and Spark in a production environment.

More Related Content

Slideshows for you (19)

Similar to Predictive Maintenance at the Dutch Railways with Ivo Everts (20)

More from Databricks (20)

Predictive Maintenance at the Dutch Railways with Ivo Everts

  1. 1. Ivo Everts, GoDataDriven (Amsterdam, NL) Predictive Maintenance at the Dutch Railways Air leakage detection in train breaking pipes #DSSAIS16
  2. 2. #DSSAIS16 !2 Tünde Alkemade Ivo Everts Jelte Hoekstra Giovanni Lanzani Janelle Zoutkamp Stefan Hendriks Wouter Hordijk Inge Kalsbeek Wan-Jui Lee Inka Locht Kee Man Nick Oosterhof Margot Peters Cyriana Roelofs Mattijs Suurland
  3. 3. #DSSAIS16 Published paper in June 2017 'Contextual Air Leakage Detection in Train Braking Pipes' @ 'Advances in Artificial Intelligence: From Theory to Practice' !2 Tünde Alkemade Ivo Everts Jelte Hoekstra Giovanni Lanzani Janelle Zoutkamp Stefan Hendriks Wouter Hordijk Inge Kalsbeek Wan-Jui Lee Inka Locht Kee Man Nick Oosterhof Margot Peters Cyriana Roelofs Mattijs Suurland
  4. 4. #DSSAIS16 Implemented the paper using PySpark; teamwork for data delivery, proper productionizing, way of working Published paper in June 2017 'Contextual Air Leakage Detection in Train Braking Pipes' @ 'Advances in Artificial Intelligence: From Theory to Practice' !2 Tünde Alkemade Ivo Everts Jelte Hoekstra Giovanni Lanzani Janelle Zoutkamp Stefan Hendriks Wouter Hordijk Inge Kalsbeek Wan-Jui Lee Inka Locht Kee Man Nick Oosterhof Margot Peters Cyriana Roelofs Mattijs Suurland
  5. 5. #DSSAIS16 Consulting data scientist @ GoDataDriven MSc in AI; PhD in Computer Vision Did lot of scientific programming Now mostly Python, Hadoop, Spark, Keras Implemented the paper using PySpark; teamwork for data delivery, proper productionizing, way of working Published paper in June 2017 'Contextual Air Leakage Detection in Train Braking Pipes' @ 'Advances in Artificial Intelligence: From Theory to Practice' !2 Tünde Alkemade Ivo Everts Jelte Hoekstra Giovanni Lanzani Janelle Zoutkamp Stefan Hendriks Wouter Hordijk Inge Kalsbeek Wan-Jui Lee Inka Locht Kee Man Nick Oosterhof Margot Peters Cyriana Roelofs Mattijs Suurland
  6. 6. #DSSAIS16 Big Data @ NS (Dutch Railways) !3 SLT DDZ VIRM FLIRT SNG ICNG Pantomonitoring Gotcha/Quovadis Data conversie RTM operations MBN Maximo Werkorder OB/SB Big Data RTM analytics MonteurWerkvoorbereider 4M Helpdesk 7503 4M 7503 TRAXX
  7. 7. #DSSAIS16 Outline • Data ingestion • Air leakage detection • Other use cases !4
  8. 8. #DSSAIS16 Data ingestion Collecting data for Real Time Monitoring !5 SLT DDZ VIRM FLIRT SNG ICNG Pantomonitoring Gotcha/Quovadis Data conversie RTM operations MBN Maximo Werkorder OB/SB Big Data RTM analytics MonteurWerkvoorbereider 4M Helpdesk 7503 4M 7503 TRAXX
  9. 9. #DSSAIS16 Data ingestion Real time data Historic data !6 ~20GB per day; 22B records 22B records
  10. 10. #DSSAIS16 Data ingestion Real time data Historic data !6 ~20GB per day; 22B records 22B records
  11. 11. #DSSAIS16 Data ingestion Real time data Historic data !6 ~20GB per day; 22B records 22B records
  12. 12. #DSSAIS16 Data ingestion Real time data Historic data !6 # fetch args table_name = sys.argv[1] origin = sys.argv[2] # read xml string from stdin xml_string = ''.join(sys.stdin.readlines()) # parse 'm and print to stdout data = xml2table(xml_string, table_name, origin, start_dt, end_dt) print(data.to_csv(encoding='utf-8', header=False, index=False, ~20GB per day; 22B records 22B records
  13. 13. #DSSAIS16 Data ingestion Real time data Historic data !6 # fetch args table_name = sys.argv[1] origin = sys.argv[2] # read xml string from stdin xml_string = ''.join(sys.stdin.readlines()) # parse 'm and print to stdout data = xml2table(xml_string, table_name, origin, start_dt, end_dt) print(data.to_csv(encoding='utf-8', header=False, index=False, ~20GB per day; 22B records 22B records
  14. 14. #DSSAIS16 Data ingestion Real time data Historic data !6 # fetch args table_name = sys.argv[1] origin = sys.argv[2] # read xml string from stdin xml_string = ''.join(sys.stdin.readlines()) # parse 'm and print to stdout data = xml2table(xml_string, table_name, origin, start_dt, end_dt) print(data.to_csv(encoding='utf-8', header=False, index=False, ~20GB per day; 22B records 22B records def get_history_df(hdfs_binary_files, output_table_name): rdd = (hdfs_binary_files .map(unzip_xml) .flatMap(lambda x: xml2rows(x, output_table_name))) return None if rdd.isEmpty() else rdd.toDF() ... ... # read binary files and parse 'm hdfs_binary_files = (sc.binaryFiles('{}*'.format(file_group)) .persist(StorageLevel.MEMORY_AND_DISK_SER) .repartition(num_executors)) sdf = get_history_df(hdfs_binary_files, args.output_table_name)
  15. 15. #DSSAIS16 Data ingestion Real time data Historic data !6 # fetch args table_name = sys.argv[1] origin = sys.argv[2] # read xml string from stdin xml_string = ''.join(sys.stdin.readlines()) # parse 'm and print to stdout data = xml2table(xml_string, table_name, origin, start_dt, end_dt) print(data.to_csv(encoding='utf-8', header=False, index=False, >> Better suited for large datasets >> Easier to (re)partition and append >> Preference for a unified approach ~20GB per day; 22B records 22B records def get_history_df(hdfs_binary_files, output_table_name): rdd = (hdfs_binary_files .map(unzip_xml) .flatMap(lambda x: xml2rows(x, output_table_name))) return None if rdd.isEmpty() else rdd.toDF() ... ... # read binary files and parse 'm hdfs_binary_files = (sc.binaryFiles('{}*'.format(file_group)) .persist(StorageLevel.MEMORY_AND_DISK_SER) .repartition(num_executors)) sdf = get_history_df(hdfs_binary_files, args.output_table_name)
  16. 16. #DSSAIS16 Air leakage detection One of the cases using RTM data !7 SLT DDZ VIRM FLIRT SNG ICNG Pantomonitoring Gotcha/Quovadis Data conversie RTM operations MBN Maximo Werkorder OB/SB Big Data RTM analytics MonteurWerkvoorbereider 4M Helpdesk 7503 4M 7503 TRAXX
  17. 17. #DSSAIS16 Air leakage detection !8 Data compressor logs Detection classification Diagnosis clustering Prognosis regression Operation humans
  18. 18. #DSSAIS16 Air leakage detection !9 Data compressor logs Detection classification Diagnosis clustering Prognosis regression Operation humans We extract median compressor run- and idle- times per hour
  19. 19. #DSSAIS16 Air leakage detection !10 Data compressor logs Detection classification Diagnosis clustering Prognosis regression Operation humans
  20. 20. #DSSAIS16 Air leakage detection Data compressor logs Detection classification Diagnosis clustering Prognosis regression Operation humans One model per train => compressor durations collected with a groupBy() and modelled in a UDF using scikit-learn to obtain the per-class probabilities: !11
  21. 21. #DSSAIS16 Air leakage detection Data compressor logs Detection classification Diagnosis clustering Prognosis regression Operation humans One model per train => compressor durations collected with a groupBy() and modelled in a UDF using scikit-learn to obtain the per-class probabilities: Anomalies !11
  22. 22. #DSSAIS16 Air leakage detection Data compressor logs Detection classification Diagnosis clustering Prognosis regression Operation humans # group data per train and collect data LOL train_data = (df.groupBy('rolling_stock_number') .agg(F.collect_list('median_duration').alias('median_duration'), F.collect_list('label').alias('label'))) # define UDF and fit model logistic_regression = F.udf(lambda X, y: sklearn_wrapper(X, y), ArrayType(FloatType())) model_data = (train_data.withColumn('logistic_model_parameters', logistic_regression(train_data.median_duration, train_data.label))) One model per train => compressor durations collected with a groupBy() and modelled in a UDF using scikit-learn to obtain the per-class probabilities: Anomalies !11
  23. 23. #DSSAIS16 Air leakage detection Data compressor logs Detection classification Diagnosis clustering Prognosis regression Operation humans !12
  24. 24. #DSSAIS16 Air leakage detection Data compressor logs Detection classification Diagnosis clustering Prognosis regression Operation humans Air leakage is defined as a cluster of anomalies, robust against outliers. For this we use a 'dynamic' variation of dbscan clustering, in a UDF, where the minimal cluster size is dependent on the logistic function: !13
  25. 25. #DSSAIS16 Air leakage detection Data compressor logs Detection classification Diagnosis clustering Prognosis regression Operation humans Air leakage is defined as a cluster of anomalies, robust against outliers. For this we use a 'dynamic' variation of dbscan clustering, in a UDF, where the minimal cluster size is dependent on the logistic function: !13
  26. 26. #DSSAIS16 Air leakage detection Data compressor logs Detection classification Diagnosis clustering Prognosis regression Operation humans !14
  27. 27. #DSSAIS16 Air leakage detection Data compressor logs Detection classification Diagnosis clustering Prognosis regression Operation humans !15
  28. 28. #DSSAIS16 Air leakage detection Data compressor logs Detection classification Diagnosis clustering Prognosis regression Operation humans # import relevant classes from sparkml from pyspark.ml.feature import VectorAssembler from pyspark.ml.regression import LinearRegression ... # prepare and run linear regression assembler = VectorAssembler(inputCols=x_cols, outputCol='features') input = assembler.transform(train_data).select(['label', 'features']) output = LinearRegression().setSolver('l-bfgs').fit(input) In the current implementation we train one regressor for all trains, so a pure Spark implementation is possible !16
  29. 29. #DSSAIS16 Air leakage detection Data compressor logs Detection classification Diagnosis clustering Prognosis regression Operation humans Oh no, humans in the loop! >> Prognosis is difficult to explain and does not perform well enough >> Diagnosis does perform pretty good (~85% true positives; ~15% solved on time) >> Dedicated staff for bridging between algorithm and organisation >> We are observing recent air leakages that would have been prevented, so we're getting there SLT DDZ VIRM FLIRT SNG ICNG Pantomonitoring Gotcha/Quovadis Data conversie RTM operations MBN Maximo Werkorder OB/SB Big Data RTM analytics MonteurWerkvoorbereider 4M Helpdesk 7503 4M 7503 TRAXX !17
  30. 30. #DSSAIS16 Air leakage detection Data compressor logs Detection classification Diagnosis clustering Prognosis regression Operation humans Oh no, humans in the loop! >> Prognosis is difficult to explain and does not perform well enough >> Diagnosis does perform pretty good (~85% true positives; ~15% solved on time) >> Dedicated staff for bridging between algorithm and organisation >> We are observing recent air leakages that would have been prevented, so we're getting there SLT DDZ VIRM FLIRT SNG ICNG Pantomonitoring Gotcha/Quovadis Data conversie RTM operations MBN Maximo Werkorder OB/SB Big Data RTM analytics MonteurWerkvoorbereider 4M Helpdesk 7503 4M 7503 TRAXX !17
  31. 31. #DSSAIS16 More use cases >> Hot wheels: failing axle bearings >> Crowd control: passengers distribution >> Slippery tracks: traction detection >> Water usage: when you got to go you got to go >> ...and many more to come !18
  32. 32. Ivo Everts | ivoeverts@godatadriven.com | @ivoeverts Predictive Maintenance at the Dutch Railways Air leakage detection in train breaking pipes #DSSAIS16

×