Successfully reported this slideshow.
Your SlideShare is downloading. ×

How to Improve Data Labels and Feedback Loops Through High-Frequency Sensor Anomaly Detection by Using InfluxDB

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 26 Ad

How to Improve Data Labels and Feedback Loops Through High-Frequency Sensor Anomaly Detection by Using InfluxDB

Download to read offline

Ezako is a startup specializing in time series analysis. Ezako helps its clients detect anomalies and label their time series data. It helps accelerate the labeling process and analyze vast amounts of data from a variety of sensors in real-time. The company provides anomaly insights and makes it easier for data scientists. Ezako is the creator of Upalgo, which is a time series data management tool that uses AI to automatically detect anomalies in streaming data.

During this webinar, Ezako will dive into how high-frequency sensors can generate huge amounts of data which can become desynchronized. This can result in data quality issues as it can contain errors and glitches. Ezako uses machine learning, labelling and feedback loops to identify these errors. Discover how the company helps improve its clients’ data quality and reduce the number of validation mistakes.

Ezako is a startup specializing in time series analysis. Ezako helps its clients detect anomalies and label their time series data. It helps accelerate the labeling process and analyze vast amounts of data from a variety of sensors in real-time. The company provides anomaly insights and makes it easier for data scientists. Ezako is the creator of Upalgo, which is a time series data management tool that uses AI to automatically detect anomalies in streaming data.

During this webinar, Ezako will dive into how high-frequency sensors can generate huge amounts of data which can become desynchronized. This can result in data quality issues as it can contain errors and glitches. Ezako uses machine learning, labelling and feedback loops to identify these errors. Discover how the company helps improve its clients’ data quality and reduce the number of validation mistakes.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to How to Improve Data Labels and Feedback Loops Through High-Frequency Sensor Anomaly Detection by Using InfluxDB (20)

Advertisement

More from InfluxData (20)

Recently uploaded (20)

Advertisement

How to Improve Data Labels and Feedback Loops Through High-Frequency Sensor Anomaly Detection by Using InfluxDB

  1. 1. How to Improve Data Labels and Feedback Loops in time-series using InfluxDB
  2. 2. Julien Muller AI expert Ex-IBM Big Data architect https://www.linkedin.com/in/mullerjulien/ CTO at Ezako Creator of Upalgo 2
  3. 3. We are Ezako Based in Paris and in Sophia-Antipolis on the French Riviera. Startup specialized in AI and time-series data. Expertise in Machine Learning. Creator of Upalgo. Aerospace, Automotive, Telecom. Sensor, telemetric and IoT data. 3 Ezako offices in Sophia-Antipolis
  4. 4. Why Upalgo ? Upalgo is a time series management suite. 4 Anomaly Detection Labeling Time series & Machine Learning: - Large datasets - Temporality matters - We don’t know the ground truth
  5. 5. InfluxDB and Ezako 5 Using InfluxDB since 2016 Influx is the 4th (relational database, nosql, hadoop ...) system we use for storage of TS data. Our issues were: - Big data (sampling) & high frequency - Slow access - Need for specific elements in the engine Windows & features - Need a community to get answers (as this is a very specific field) Why did we chose InfluxDB ? - Storage adapted to TS data - Better performance - Native nanosecond handling - No schema
  6. 6. Upalgo architecture 6 Our data challenges: - Continuous writes - Intensive reads at learning phases The architectural solution: - InfluxDB
  7. 7. Machine Learning with InfluxDB 7 Machine Learning is challenging because: - Continuous data insert (often between 1khz to 50khz sensors) - Intensive metadata / feature calculations - Learning on huge datasets - Fast detection on small data sets - You don’t know the ground truth InfluxDB brings a solution to these limitations.
  8. 8. An Anomaly Detection workflow 8 Anomaly Detection in time-series is hard because two users won’t have the same definition of an anomaly. A solid workflow is essential to perform a good Anomaly Detection: ➔ insert data ➔ calculate features ➔ understand your data ➔ learn a model ➔ detect
  9. 9. InfluxDB as intermediary storage 9 Raw data must be stored (reference). Adjusted data is useful. ➔ We store several calculated time-series for each raw time-serie.
  10. 10. An Anomaly Detection workflow 10 Data processing Meta-data extraction Feature calculation Validated model Label spreading Learning Anomaly detection Labeling Raw Data InfluxDB VisualizeVisualizeVisualize
  11. 11. What is Labeling ? Labeling is the activity of tagging one or more labels to identify certain properties or characteristics of data. Labeled data produce considerable improvement in learning accuracy. Labeling is a time consuming process which is a crucial part of training machine learning algorithms. Data Scientists and experts spend most of their time in this repetitive task. 11
  12. 12. Challenge 1 12 1. User friendly UI 2. Auto label spreading with Machine Learning How do you put 20 000 labels on 20 million data points in a few minutes?
  13. 13. Labeling is interesting because 13 ➔ Experts want more information on their data ➔ Supervised Machine Learning need labels ➔ Manual labeling is exhausting
  14. 14. Ergonomics can increase by 15 times the speed of labeling 14
  15. 15. AI based label conflict management All the labels are controlled for conflicts. Benefits: reduce labeling errors. 15
  16. 16. UI based labeling and tag management Always visible and accessible one-click labeling. 16 Confirming and discarding the label propositions Tag management Tags Labels
  17. 17. Label propagation can increase by 15 times the labeling speed The idea is to to label the entire dataset with AI based auto label propagation. Benefits: much faster labelling. 17
  18. 18. Label propagation 18 Propagated labels ready to be confirmed.
  19. 19. Challenge 2 19 Create an Anomaly Detection workflow based on user feedback loop. Optimize algorithm performance through user feedbacks.
  20. 20. Feedback loop is interesting because 20 ➔ Continuous relearn ➔ Read challenges on big data sets ➔ UI complexity
  21. 21. Importance of UI in feedback loops an anomaly 21
  22. 22. A scoring system to optimize the model configuration 22 Use a scoring system in order to optimize the algorithm and feature choices.
  23. 23. To sum-up 23 Time-series labeling and feedback management is very complex and difficult. The solution is to: - adopt a TS database as InfluxDB - create a user-friendly UI - apply propagation tools to spped up things - implement an efficient workflow Our experience with InfluxDB: - pretty smooth - plug and forget mentality
  24. 24. Migrating to influxDB 2.0 ? 24 ➔ influxDB IOX ➔ influx Query Language: flux -> New functions ...
  25. 25. Q/A 25
  26. 26. Julien Muller julien.muller@ezako.com +33 6 65 06 64 66 www.ezako.com

×