Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1
Sayan Chakraborty
Smit Shah
Scaling AutoML-driven
Anomaly Detection with
Luminaire
Who We Are
Data Governance Platform Team
@ Zillow
Sayan Chakraborty
Senior Applied Scientist
Smit Shah
Senior Software Dev...
Agenda
● What is Zillow?
● Why Monitor Data Quality
● Data Quality Challenges
● Luminaire and Scaling
● Key Takeaways
Zillow
About Zillow
● Reimagining real estate to make it
easier to unlock life’s next chapter
* As of Q4-2020
● Offer customers a...
Why Monitor Data Quality
Why Monitor Data Quality?
● Data fuels many customer facing
and internal services at Zillow that
rely on high quality data...
Why detect Anomalies?
Anomaly
A data instance or behavior significantly
different from the ‘regular’ patterns
Complex
Time...
Ways to Monitor Data Quality
Rule Based
● Domain experts sets pre-specified
rules or thresholds
○ Example: Percent of null...
Data Quality Challenges
Data Quality is Context Dependent
● Depends on the use case
● Depends on the reference time frame under consideration
○ Ex...
Challenges
● Modeling
○ Wide ranges of time series patterns from different data sources - one model
doesn’t fit all
○ Defi...
Wishlist for the system
● Able to catch any data irregularities
● Scale for large amount of data and metrics
● Minimal con...
Luminaire
Luminaire Python Package
Integrated with
Different
Models
AutoML Built-in
Proven to
Outperform
Many Existing
Methods
Time ...
Luminaire Components
AutoML
Training Components
Data Profiling / Preprocessing
Batch Data Modeling
Streaming Data
Modeling...
Data Profiling / Preprocessing
AutoML
Training Components
Data Profiling / Preprocessing
Batch Data Modeling
Streaming Dat...
Training - Batch
AutoML
Training Components
Data Profiling / Preprocessing
Batch Data Modeling
Streaming Data
Modeling
>>>...
Training - Streaming
AutoML
Training Components
Data Profiling / Preprocessing
Batch Data Modeling
Streaming Data
Modeling...
AutoML - Configuration Optimization
AutoML
Training Components
Data Profiling / Preprocessing
Batch Data Modeling
Streamin...
Scoring - Batch
Scoring/Alerting
Scoring Components
Pull Batch Model
Pull Streaming
Model
>>> model.score(2000, '2020-06-0...
Scoring - Streaming
Scoring/Alerting
Scoring Components
Pull Batch Model
Pull Streaming
Model
>>> freq = model._params['fr...
Scaling
Scaling - Distributed Training/Scoring
Reference: https://www.zillow.com/tech/anomaly-detection-at-zillow-using-luminaire/
Scaling - Distributed Training/Scoring
Reference: https://www.zillow.com/tech/anomaly-detection-at-zillow-using-luminaire/
Scaling - Distributed Training/Scoring
Reference: https://www.zillow.com/tech/anomaly-detection-at-zillow-using-luminaire/
Scaling - Distributed Training/Scoring
Reference: https://www.zillow.com/tech/anomaly-detection-at-zillow-using-luminaire/
Scaling - Distributed Training/Scoring
Reference: https://www.zillow.com/tech/anomaly-detection-at-zillow-using-luminaire/
Scaling - Distributed using Spark
metrics time_series
[data-date, observed-value]
run_date
met_1 [[2021-01-01, 125], [2021...
Our Integrations with Central Data Systems
● Self-service UI for easier on-boarding
● Surfacing health metrics of the data...
Future Direction
● Support anomaly detection beyond temporal context
● Build decision systems for ML pipelines using Lumin...
Key Takeaways
Key Takeaways
● Luminaire is a python library which supports anomaly detection for wide
variety of time series patterns an...
Questions?
Thank you!
https://www.zillow.com/careers/
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Scaling AutoML-Driven Anomaly Detection With Luminaire

Download to read offline

Organizations rely heavily on time series metrics to measure and model key aspects of operational and business performance. The ability to reliably detect issues with these metrics is imperative to identifying early indicators of major problems before they become pervasive. This is a difficult machine learning and systems problem because temporal patterns are complex, ever changing, and often very noisy, traditionally requiring significant manual configuration and model maintenance.

At Zillow, we have built an orchestration framework around Luminaire, our open-source python library for hands-off time-series Anomaly Detection. Luminaire provides a suite of models and built-in AutoML capabilities which we process with Spark for distributed training and scoring of thousands of metrics. In this talk, we will cover the architecture of this framework and performance of the Luminaire package across detection and prediction accuracy as well as runtime efficiency.

  • Be the first to like this

Scaling AutoML-Driven Anomaly Detection With Luminaire

  1. 1. 1 Sayan Chakraborty Smit Shah Scaling AutoML-driven Anomaly Detection with Luminaire
  2. 2. Who We Are Data Governance Platform Team @ Zillow Sayan Chakraborty Senior Applied Scientist Smit Shah Senior Software Development Engineer, Big Data
  3. 3. Agenda ● What is Zillow? ● Why Monitor Data Quality ● Data Quality Challenges ● Luminaire and Scaling ● Key Takeaways
  4. 4. Zillow
  5. 5. About Zillow ● Reimagining real estate to make it easier to unlock life’s next chapter * As of Q4-2020 ● Offer customers an on-demand experience for selling, buying, renting and financing with transparency and nearly seamless end-to-end service ● Most-visited real estate website in the United States
  6. 6. Why Monitor Data Quality
  7. 7. Why Monitor Data Quality? ● Data fuels many customer facing and internal services at Zillow that rely on high quality data ○ Zestimate ○ Zillow Offers ○ Zillow Premier Agent ○ Econ and many more ● Reliable performance of ML and Services requires certain level of data quality
  8. 8. Why detect Anomalies? Anomaly A data instance or behavior significantly different from the ‘regular’ patterns Complex Time-sensitive Inevitable Catching anomalies in important metric helps keep our business healthy
  9. 9. Ways to Monitor Data Quality Rule Based ● Domain experts sets pre-specified rules or thresholds ○ Example: Percent of null data should be less than 2% per day for a given metric ● Less complicated to set up and easy to interpret ● Works well when the properties of data are simple and remain stationary over time ML Based ● Rules are set through mathematical modeling ● Works well when properties of data are complex and changes over time ● A more hands-off approach
  10. 10. Data Quality Challenges
  11. 11. Data Quality is Context Dependent ● Depends on the use case ● Depends on the reference time frame under consideration ○ Example: Different interpretation of the same fluctuation can be obtained when compared under shorter vs longer reference time-frames ● Depends on externalities such as holidays, product launches, market specific etc
  12. 12. Challenges ● Modeling ○ Wide ranges of time series patterns from different data sources - one model doesn’t fit all ○ Definition of anomalies changes at different levels of aggregation of the same data ● Scaling and Standardization ○ Everyone (Analyst, PM, DE) should be able to use ML for anomaly detection and get trustworthy data (but everyone is not an ML expert) ○ Require Scalability for handling large amount of data across teams
  13. 13. Wishlist for the system ● Able to catch any data irregularities ● Scale for large amount of data and metrics ● Minimal configuration ● Minimal maintenance over time No existing solution meets the above requirements
  14. 14. Luminaire
  15. 15. Luminaire Python Package Integrated with Different Models AutoML Built-in Proven to Outperform Many Existing Methods Time series Data Profiling Enabled Built for Batch and Streaming use cases Key Features Github: https://github.com/zillow/luminaire Tutorials: https://zillow.github.io/luminaire/ Scientific Paper (IEEE BigData 2020): Building an Automated and Self-Aware Anomaly Detection System (arxiv link)
  16. 16. Luminaire Components AutoML Training Components Data Profiling / Preprocessing Batch Data Modeling Streaming Data Modeling Scoring/Alerting Scoring Components Pull Batch Model Pull Streaming Model
  17. 17. Data Profiling / Preprocessing AutoML Training Components Data Profiling / Preprocessing Batch Data Modeling Streaming Data Modeling >>> from luminaire.exploration.data_exploration import DataExploration >>> de_obj = DataExploration(freq='D', data_shift_truncate=False, is_log_transformed=True, fill_rate=0.9) >>> data, pre_prc = de_obj.profile(data) >>> print(pre_prc) {'success': True, 'trend_change_list': ['2020-04-01 00:00:00'], 'change_point_list': ['2020-03-16 00:00:00'], 'is_log_transformed': 1, 'min_ts_mean': None, 'ts_start': '2020- 01-01 00:00:00', 'ts_end': '2020-06-07 00:00:00'}
  18. 18. Training - Batch AutoML Training Components Data Profiling / Preprocessing Batch Data Modeling Streaming Data Modeling >>> from luminaire.model.lad_structural import LADStructuralModel >>> hyper_params = {"include_holidays_exog": True, "is_log_transformed": False, "max_ft_freq": 5, "p": 3, "q": 3} >>> lad_struct_obj = LADStructuralModel(hyper_params=hyper_params, freq='D') >>> success, model_date, model = lad_struct_obj.train(data=data, **pre_prc) >>> print(success, model_date, model) (True, '2020-06-07 00:00:00', <luminaire_models.model.lad_structural.LADStructuralModel object at 0x7f97e127d320>)
  19. 19. Training - Streaming AutoML Training Components Data Profiling / Preprocessing Batch Data Modeling Streaming Data Modeling >>> from luminaire.model.window_density import WindowDensityHyperParams, WindowDensityModel >>> from luminaire.exploration.data_exploration import DataExploration >>> config = WindowDensityHyperParams().params >>> de_obj = DataExploration(**config) >>> data, pre_prc = de_obj.stream_profile(df=data) >>> config.update(pre_prc) >>> wdm_obj = WindowDensityModel(hyper_params=config) >>> success, training_end, model = wdm_obj.train(data=data) >>> print(success, training_end, model) True 2020-07-03 00:00:00 <luminaire.model.window_density.WindowDensityModel object at 0x7fb6fab80b00>
  20. 20. AutoML - Configuration Optimization AutoML Training Components Data Profiling / Preprocessing Batch Data Modeling Streaming Data Modeling >>> from luminaire.optimization.hyperparameter_optimization import HyperparameterOptimization >>> hopt_obj = HyperparameterOptimization(freq='D') >>> opt_config = hopt_obj.run(data=data) >>> print(opt_config) {'LuminaireModel': 'LADStructuralModel', 'data_shift_truncate': 0, 'fill_rate': 0.742353444620679, 'include_holidays_exog': 1, 'is_log_transformed': 1, 'max_ft_freq': 2, 'p': 1, 'q': 1} >>> model_class_name = opt_config['LuminaireModel'] >>> module = __import__('luminaire.model', fromlist=['']) >>> model_class = getattr(module, model_class_name) >>> model_object = model_class(hyper_params=opt_config, freq='D') >>> success, model_date, trained_model = model_object.train(data=training_data, **pre_prc) >>> print(success, model_date, model) (True, '2020-06-07 00:00:00', <luminaire_models.model.lad_structural.LADStructuralModel object at 0x7fe2b47a7978>)
  21. 21. Scoring - Batch Scoring/Alerting Scoring Components Pull Batch Model Pull Streaming Model >>> model.score(2000, '2020-06-08') {'Success': True, 'IsLogTransformed': 1, 'LogTransformedAdjustedActual': 7.601402334583733, 'LogTransformedPrediction': 7.85697078664991, 'LogTransformedStdErr': 0.05909378128162875, 'LogTransformedCILower': 7.759770166178546, 'LogTransformedCIUpper': 7.954171407121274, 'AdjustedActual': 2000.000000000015, 'Prediction': 1913.333800801316, 'StdErr': 111.1165409184448, 'CILower': 1722.81265596681, 'CIUpper': 2093.854945635823, 'ConfLevel': 90.0, 'ExogenousHolidays': 0, 'IsAnomaly': False, 'IsAnomalyExtreme': False, 'AnomalyProbability': 0.9616869199903785, 'DownAnomalyProbability': 0.21915654000481077, 'UpAnomalyProbability': 0.7808434599951892, 'ModelFreshness': 0.1}
  22. 22. Scoring - Streaming Scoring/Alerting Scoring Components Pull Batch Model Pull Streaming Model >>> freq = model._params['freq'] >>> de_obj = DataExploration(freq=freq) >>> processed_data, pre_prc = de_obj.stream_profile(df=data, impute_only=True, impute_zero=True) >>> score, scored_window = model.score(processed_data) >>> print(score) {'Success': True, 'ConfLevel': 99.9, 'IsAnomaly': True, 'AnomalyProbability': 1.0}
  23. 23. Scaling
  24. 24. Scaling - Distributed Training/Scoring Reference: https://www.zillow.com/tech/anomaly-detection-at-zillow-using-luminaire/
  25. 25. Scaling - Distributed Training/Scoring Reference: https://www.zillow.com/tech/anomaly-detection-at-zillow-using-luminaire/
  26. 26. Scaling - Distributed Training/Scoring Reference: https://www.zillow.com/tech/anomaly-detection-at-zillow-using-luminaire/
  27. 27. Scaling - Distributed Training/Scoring Reference: https://www.zillow.com/tech/anomaly-detection-at-zillow-using-luminaire/
  28. 28. Scaling - Distributed Training/Scoring Reference: https://www.zillow.com/tech/anomaly-detection-at-zillow-using-luminaire/
  29. 29. Scaling - Distributed using Spark metrics time_series [data-date, observed-value] run_date met_1 [[2021-01-01, 125], [2021-01- 02, 135], [2021-01-03, 140], ...] 2021-02-01 00:00:00 met_2 [[2021-01-01, 0.17], [2021-01- 02, 0.19], [2021-01-03, 0.22], ...] 2021-02-01 00:00:00 UDF (Train) metrics time_series [data-date, observed-value] run_date model_object met_1 [[2021-01-01, 125], [2021-01- 02, 135], [2021-01-03, 140], ...] 2021-02-01 00:00:00 <object_met_1> met_2 [[2021-01-01, 0.17], [2021-01- 02, 0.19], [2021-01-03, 0.22], ...] 2021-02-01 00:00:00 <object_met_2> UDF (Score) metrics time_series [data-date, observed-value] run_date model_object met_1 [[2021-04-01, 115], [2021-04- 02, 113]] 2021-04-02 00:00:00 <object_met_1> met_2 [[2021-04-01, 0.45], [2021-04- 02, 0.36]] 2021-04-02 00:00:00 <object_met_2> metrics time_series [data-date, observed-value] run_date score_results met_1 [[2021-04-01, 115], [2021-04- 02, 113]] 2021-04-02 00:00:00 [{“success”: True, “'AnomalyProbab ility': 0.85, ..}, ..] met_2 [[2021-04-01, 0.45], [2021-04- 02, 0.36]] 2021-04-02 00:00:00 [{“success”: True, “'AnomalyProbab ility': 0.995, ..}, ..] * These values are simulated
  30. 30. Our Integrations with Central Data Systems ● Self-service UI for easier on-boarding ● Surfacing health metrics of the data source with central data catalog ● Tagging producers and consumers of the anomaly detection jobs ● Smart Alerting based on scoring output sensitivity
  31. 31. Future Direction ● Support anomaly detection beyond temporal context ● Build decision systems for ML pipelines using Luminaire ● Root Cause Analysis to go a step ahead from detection to diagnosis ● User feedback to get labeled anomalies
  32. 32. Key Takeaways
  33. 33. Key Takeaways ● Luminaire is a python library which supports anomaly detection for wide variety of time series patterns and use cases ● Proposed a technique to build a fully automated anomaly detection system that scales to big data use cases and requires minimal maintenance
  34. 34. Questions? Thank you! https://www.zillow.com/careers/

Organizations rely heavily on time series metrics to measure and model key aspects of operational and business performance. The ability to reliably detect issues with these metrics is imperative to identifying early indicators of major problems before they become pervasive. This is a difficult machine learning and systems problem because temporal patterns are complex, ever changing, and often very noisy, traditionally requiring significant manual configuration and model maintenance. At Zillow, we have built an orchestration framework around Luminaire, our open-source python library for hands-off time-series Anomaly Detection. Luminaire provides a suite of models and built-in AutoML capabilities which we process with Spark for distributed training and scoring of thousands of metrics. In this talk, we will cover the architecture of this framework and performance of the Luminaire package across detection and prediction accuracy as well as runtime efficiency.

Views

Total views

109

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

5

Shares

0

Comments

0

Likes

0

×