Introduction to e tapr for hai con -engDACON AI 데이콘
This document proposes an enhanced accuracy metric called eTaPR for evaluating anomaly detection methods on time-series data. It addresses limitations of precision and recall for time-series by considering that anomalies and predictions are represented as ranges rather than single points. The key ideas are (1) scoring anomalies based on the portion detected, (2) scoring predictions based on the portion identifying anomalies correctly, and (3) only scoring matches between complete anomaly and prediction ranges. The proposed metrics are enhanced time-series recall (eTaR), precision (eTaP) and the harmonic mean F1 score.
Introduction to e tapr for hai con -korDACON AI 데이콘
The document proposes an enhanced method (eTaPR) for evaluating the accuracy of anomaly detection models on time-series data. It identifies limitations of existing evaluation methods when applied to time-series, where anomalies and predictions can cover ranges rather than single points. The eTaPR method accounts for this by scoring predictions and anomalies based on the extent of their detected or predicted ranges, and only scoring detections as successful if the full ranges overlap. This provides a more robust evaluation of models' ability to detect anomalies in time-series data.
Our GOAL
해외에는 이런 데이터 경쟁 플랫폼이 있습니다. 한국에는 없죠. 국내 공공기관 또는 개별 기업들이 스팟성으로 불투명한 대회를 벗어나 지속적으로 대회를 운영하는 플랫폼이 있으면 좋겠다고 생각했습니다. 우리는 지금 Fintech 기업들과 함께 금융 데이터와 상금을 제공하며, 데이터 과학자 와 데이터 엔지니어링을 포함하는 데이터 대회를 운영합니다.
There are these data competition platforms overseas, but in Korea, Domestic public organizations or individual companies are out of the opaque temporary contest I wanted to have a platform that consistently runs the competition. We now provide financial data and cash prizes with Fintech companies, we run the Data Competition included in Data Engineer and Data Scientists.
Introduction to e tapr for hai con -engDACON AI 데이콘
This document proposes an enhanced accuracy metric called eTaPR for evaluating anomaly detection methods on time-series data. It addresses limitations of precision and recall for time-series by considering that anomalies and predictions are represented as ranges rather than single points. The key ideas are (1) scoring anomalies based on the portion detected, (2) scoring predictions based on the portion identifying anomalies correctly, and (3) only scoring matches between complete anomaly and prediction ranges. The proposed metrics are enhanced time-series recall (eTaR), precision (eTaP) and the harmonic mean F1 score.
Introduction to e tapr for hai con -korDACON AI 데이콘
The document proposes an enhanced method (eTaPR) for evaluating the accuracy of anomaly detection models on time-series data. It identifies limitations of existing evaluation methods when applied to time-series, where anomalies and predictions can cover ranges rather than single points. The eTaPR method accounts for this by scoring predictions and anomalies based on the extent of their detected or predicted ranges, and only scoring detections as successful if the full ranges overlap. This provides a more robust evaluation of models' ability to detect anomalies in time-series data.
Our GOAL
해외에는 이런 데이터 경쟁 플랫폼이 있습니다. 한국에는 없죠. 국내 공공기관 또는 개별 기업들이 스팟성으로 불투명한 대회를 벗어나 지속적으로 대회를 운영하는 플랫폼이 있으면 좋겠다고 생각했습니다. 우리는 지금 Fintech 기업들과 함께 금융 데이터와 상금을 제공하며, 데이터 과학자 와 데이터 엔지니어링을 포함하는 데이터 대회를 운영합니다.
There are these data competition platforms overseas, but in Korea, Domestic public organizations or individual companies are out of the opaque temporary contest I wanted to have a platform that consistently runs the competition. We now provide financial data and cash prizes with Fintech companies, we run the Data Competition included in Data Engineer and Data Scientists.
The document summarizes the approach taken to detect collisions by analyzing vibration data in a machine learning competition hosted on Dacon.io. It describes initial data processing steps including Fourier transforms and calculating onset times. Models tested included XGB, DNNs, and 1D CNNs applied directly to raw waveform data. Key aspects that improved performance were using L1 loss with Adam optimization, quantile transformations of the output, and developing separate models for position, mass and velocity rather than a single model. The best score achieved was around 0.0015-0.0017.
This document summarizes the features, models, feature selection, and final ensemble used in an R 5th Private Solution. It describes various statistics and ratio features calculated on the data. It explains that light gradient boosting machines (LGBM) are used with different feature sets and training on whole or subsetted data based on rho values. Feature selection uses permutation importance. The final ensemble averages 4 models for hhb and 3 for other targets, varying the feature sets and whether training was on whole or subsetted data.
The document summarizes the approach taken to detect collisions by analyzing vibration data in a machine learning competition hosted on Dacon.io. It describes initial data processing steps including Fourier transforms and calculating onset times. Models tested included XGB, DNNs, and 1D CNNs applied directly to raw waveform data. Key aspects that improved performance were using L1 loss with Adam optimization, quantile transformations of the output, and developing separate models for position, mass and velocity rather than a single model. The best score achieved was around 0.0015-0.0017.
This document summarizes the features, models, feature selection, and final ensemble used in an R 5th Private Solution. It describes various statistics and ratio features calculated on the data. It explains that light gradient boosting machines (LGBM) are used with different feature sets and training on whole or subsetted data based on rho values. Feature selection uses permutation importance. The final ensemble averages 4 models for hhb and 3 for other targets, varying the feature sets and whether training was on whole or subsetted data.