SlideShare a Scribd company logo
1 of 34
Download to read offline
1
Sayan Chakraborty
Smit Shah
Scaling AutoML-driven
Anomaly Detection with
Luminaire
Who We Are
Data Governance Platform Team
@ Zillow
Sayan Chakraborty
Senior Applied Scientist
Smit Shah
Senior Software Development
Engineer, Big Data
Agenda
● What is Zillow?
● Why Monitor Data Quality
● Data Quality Challenges
● Luminaire and Scaling
● Key Takeaways
Zillow
About Zillow
● Reimagining real estate to make it
easier to unlock life’s next chapter
* As of Q4-2020
● Offer customers an on-demand
experience for selling, buying,
renting and financing with
transparency and nearly seamless
end-to-end service
● Most-visited real estate website in
the United States
Why Monitor Data Quality
Why Monitor Data Quality?
● Data fuels many customer facing
and internal services at Zillow that
rely on high quality data
○ Zestimate
○ Zillow Offers
○ Zillow Premier Agent
○ Econ and many more
● Reliable performance of ML and
Services requires certain level of
data quality
Why detect Anomalies?
Anomaly
A data instance or behavior significantly
different from the ‘regular’ patterns
Complex
Time-sensitive
Inevitable
Catching anomalies in important metric
helps keep our business healthy
Ways to Monitor Data Quality
Rule Based
● Domain experts sets pre-specified
rules or thresholds
○ Example: Percent of null data should be
less than 2% per day for a given metric
● Less complicated to set up and
easy to interpret
● Works well when the properties of
data are simple and remain
stationary over time
ML Based
● Rules are set through mathematical
modeling
● Works well when properties of data
are complex and changes over time
● A more hands-off approach
Data Quality Challenges
Data Quality is Context Dependent
● Depends on the use case
● Depends on the reference time frame under consideration
○ Example: Different interpretation of the same fluctuation can be obtained when compared
under shorter vs longer reference time-frames
● Depends on externalities such as holidays, product launches, market specific
etc
Challenges
● Modeling
○ Wide ranges of time series patterns from different data sources - one model
doesn’t fit all
○ Definition of anomalies changes at different levels of aggregation of the same
data
● Scaling and Standardization
○ Everyone (Analyst, PM, DE) should be able to use ML for anomaly detection and
get trustworthy data (but everyone is not an ML expert)
○ Require Scalability for handling large amount of data across teams
Wishlist for the system
● Able to catch any data irregularities
● Scale for large amount of data and metrics
● Minimal configuration
● Minimal maintenance over time
No existing solution meets the above requirements
Luminaire
Luminaire Python Package
Integrated with
Different
Models
AutoML Built-in
Proven to
Outperform
Many Existing
Methods
Time series
Data Profiling
Enabled
Built for Batch
and Streaming
use cases
Key Features
Github: https://github.com/zillow/luminaire
Tutorials: https://zillow.github.io/luminaire/
Scientific Paper (IEEE BigData 2020): Building an Automated and Self-Aware Anomaly Detection System (arxiv link)
Luminaire Components
AutoML
Training Components
Data Profiling / Preprocessing
Batch Data Modeling
Streaming Data
Modeling
Scoring/Alerting
Scoring Components
Pull Batch Model
Pull Streaming
Model
Data Profiling / Preprocessing
AutoML
Training Components
Data Profiling / Preprocessing
Batch Data Modeling
Streaming Data
Modeling
>>> from luminaire.exploration.data_exploration import
DataExploration
>>> de_obj = DataExploration(freq='D', data_shift_truncate=False,
is_log_transformed=True, fill_rate=0.9)
>>> data, pre_prc = de_obj.profile(data)
>>> print(pre_prc)
{'success': True, 'trend_change_list': ['2020-04-01 00:00:00'],
'change_point_list': ['2020-03-16 00:00:00'],
'is_log_transformed': 1, 'min_ts_mean': None, 'ts_start': '2020-
01-01 00:00:00', 'ts_end': '2020-06-07 00:00:00'}
Training - Batch
AutoML
Training Components
Data Profiling / Preprocessing
Batch Data Modeling
Streaming Data
Modeling
>>> from luminaire.model.lad_structural import LADStructuralModel
>>> hyper_params = {"include_holidays_exog": True,
"is_log_transformed": False, "max_ft_freq": 5, "p": 3, "q": 3}
>>> lad_struct_obj = LADStructuralModel(hyper_params=hyper_params,
freq='D')
>>> success, model_date, model = lad_struct_obj.train(data=data,
**pre_prc)
>>> print(success, model_date, model)
(True, '2020-06-07 00:00:00',
<luminaire_models.model.lad_structural.LADStructuralModel object
at 0x7f97e127d320>)
Training - Streaming
AutoML
Training Components
Data Profiling / Preprocessing
Batch Data Modeling
Streaming Data
Modeling
>>> from luminaire.model.window_density import
WindowDensityHyperParams, WindowDensityModel
>>> from luminaire.exploration.data_exploration import
DataExploration
>>> config = WindowDensityHyperParams().params
>>> de_obj = DataExploration(**config)
>>> data, pre_prc = de_obj.stream_profile(df=data)
>>> config.update(pre_prc)
>>> wdm_obj = WindowDensityModel(hyper_params=config)
>>> success, training_end, model = wdm_obj.train(data=data)
>>> print(success, training_end, model)
True 2020-07-03 00:00:00
<luminaire.model.window_density.WindowDensityModel object at
0x7fb6fab80b00>
AutoML - Configuration Optimization
AutoML
Training Components
Data Profiling / Preprocessing
Batch Data Modeling
Streaming Data
Modeling
>>> from luminaire.optimization.hyperparameter_optimization import
HyperparameterOptimization
>>> hopt_obj = HyperparameterOptimization(freq='D')
>>> opt_config = hopt_obj.run(data=data)
>>> print(opt_config)
{'LuminaireModel': 'LADStructuralModel', 'data_shift_truncate': 0,
'fill_rate': 0.742353444620679, 'include_holidays_exog': 1,
'is_log_transformed': 1, 'max_ft_freq': 2, 'p': 1, 'q': 1}
>>> model_class_name = opt_config['LuminaireModel']
>>> module = __import__('luminaire.model', fromlist=[''])
>>> model_class = getattr(module, model_class_name)
>>> model_object = model_class(hyper_params=opt_config, freq='D')
>>> success, model_date, trained_model =
model_object.train(data=training_data, **pre_prc)
>>> print(success, model_date, model)
(True, '2020-06-07 00:00:00',
<luminaire_models.model.lad_structural.LADStructuralModel object
at 0x7fe2b47a7978>)
Scoring - Batch
Scoring/Alerting
Scoring Components
Pull Batch Model
Pull Streaming
Model
>>> model.score(2000, '2020-06-08')
{'Success': True, 'IsLogTransformed': 1,
'LogTransformedAdjustedActual': 7.601402334583733,
'LogTransformedPrediction': 7.85697078664991,
'LogTransformedStdErr': 0.05909378128162875,
'LogTransformedCILower': 7.759770166178546,
'LogTransformedCIUpper': 7.954171407121274, 'AdjustedActual':
2000.000000000015, 'Prediction': 1913.333800801316, 'StdErr':
111.1165409184448, 'CILower': 1722.81265596681, 'CIUpper':
2093.854945635823, 'ConfLevel': 90.0, 'ExogenousHolidays': 0,
'IsAnomaly': False, 'IsAnomalyExtreme': False,
'AnomalyProbability': 0.9616869199903785,
'DownAnomalyProbability': 0.21915654000481077,
'UpAnomalyProbability': 0.7808434599951892, 'ModelFreshness': 0.1}
Scoring - Streaming
Scoring/Alerting
Scoring Components
Pull Batch Model
Pull Streaming
Model
>>> freq = model._params['freq']
>>> de_obj = DataExploration(freq=freq)
>>> processed_data, pre_prc =
de_obj.stream_profile(df=data, impute_only=True,
impute_zero=True)
>>> score, scored_window = model.score(processed_data)
>>> print(score)
{'Success': True, 'ConfLevel': 99.9, 'IsAnomaly': True,
'AnomalyProbability': 1.0}
Scaling
Scaling - Distributed Training/Scoring
Reference: https://www.zillow.com/tech/anomaly-detection-at-zillow-using-luminaire/
Scaling - Distributed Training/Scoring
Reference: https://www.zillow.com/tech/anomaly-detection-at-zillow-using-luminaire/
Scaling - Distributed Training/Scoring
Reference: https://www.zillow.com/tech/anomaly-detection-at-zillow-using-luminaire/
Scaling - Distributed Training/Scoring
Reference: https://www.zillow.com/tech/anomaly-detection-at-zillow-using-luminaire/
Scaling - Distributed Training/Scoring
Reference: https://www.zillow.com/tech/anomaly-detection-at-zillow-using-luminaire/
Scaling - Distributed using Spark
metrics time_series
[data-date, observed-value]
run_date
met_1 [[2021-01-01, 125], [2021-01-
02, 135],
[2021-01-03, 140], ...]
2021-02-01
00:00:00
met_2 [[2021-01-01, 0.17], [2021-01-
02, 0.19],
[2021-01-03, 0.22], ...]
2021-02-01
00:00:00
UDF
(Train)
metrics time_series
[data-date, observed-value]
run_date model_object
met_1 [[2021-01-01, 125], [2021-01-
02, 135],
[2021-01-03, 140], ...]
2021-02-01
00:00:00
<object_met_1>
met_2 [[2021-01-01, 0.17], [2021-01-
02, 0.19],
[2021-01-03, 0.22], ...]
2021-02-01
00:00:00
<object_met_2>
UDF
(Score)
metrics time_series
[data-date, observed-value]
run_date model_object
met_1 [[2021-04-01, 115], [2021-04-
02, 113]]
2021-04-02
00:00:00
<object_met_1>
met_2 [[2021-04-01, 0.45], [2021-04-
02, 0.36]]
2021-04-02
00:00:00
<object_met_2>
metrics time_series
[data-date, observed-value]
run_date score_results
met_1 [[2021-04-01, 115], [2021-04-
02, 113]]
2021-04-02
00:00:00
[{“success”:
True,
“'AnomalyProbab
ility': 0.85, ..}, ..]
met_2 [[2021-04-01, 0.45], [2021-04-
02, 0.36]]
2021-04-02
00:00:00
[{“success”:
True,
“'AnomalyProbab
ility': 0.995, ..}, ..]
* These values are simulated
Our Integrations with Central Data Systems
● Self-service UI for easier on-boarding
● Surfacing health metrics of the data source with central data catalog
● Tagging producers and consumers of the anomaly detection jobs
● Smart Alerting based on scoring output sensitivity
Future Direction
● Support anomaly detection beyond temporal context
● Build decision systems for ML pipelines using Luminaire
● Root Cause Analysis to go a step ahead from detection to diagnosis
● User feedback to get labeled anomalies
Key Takeaways
Key Takeaways
● Luminaire is a python library which supports anomaly detection for wide
variety of time series patterns and use cases
● Proposed a technique to build a fully automated anomaly detection system
that scales to big data use cases and requires minimal maintenance
Questions?
Thank you!
https://www.zillow.com/careers/

More Related Content

What's hot

Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon Web Services
 
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAnnouncing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAmazon Web Services
 
Security-by-Design and -Default
 Security-by-Design and -Default Security-by-Design and -Default
Security-by-Design and -DefaultMehdi Mirakhorli
 
Building end-to-end IT Lifecycle Mgmt & Workflows with AWS Service Catalog - ...
Building end-to-end IT Lifecycle Mgmt & Workflows with AWS Service Catalog - ...Building end-to-end IT Lifecycle Mgmt & Workflows with AWS Service Catalog - ...
Building end-to-end IT Lifecycle Mgmt & Workflows with AWS Service Catalog - ...Amazon Web Services
 
DevSecOps : de la théorie à la pratique
DevSecOps : de la théorie à la pratiqueDevSecOps : de la théorie à la pratique
DevSecOps : de la théorie à la pratiquebertrandmeens
 
[Gaming on AWS] AWS와 함께 한 쿠키런 서버 Re-architecting 사례 - 데브시스터즈
[Gaming on AWS] AWS와 함께 한 쿠키런 서버 Re-architecting 사례 - 데브시스터즈[Gaming on AWS] AWS와 함께 한 쿠키런 서버 Re-architecting 사례 - 데브시스터즈
[Gaming on AWS] AWS와 함께 한 쿠키런 서버 Re-architecting 사례 - 데브시스터즈Amazon Web Services Korea
 
Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...
Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...
Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...Amazon Web Services Korea
 
Performance Modeling of Serverless Computing Platforms - CASCON2020 Workshop ...
Performance Modeling of Serverless Computing Platforms - CASCON2020 Workshop ...Performance Modeling of Serverless Computing Platforms - CASCON2020 Workshop ...
Performance Modeling of Serverless Computing Platforms - CASCON2020 Workshop ...Nima Mahmoudi
 
Automated Testing For Protecting Data Pipelines from Undocumented Assumptions
Automated Testing For Protecting Data Pipelines from Undocumented AssumptionsAutomated Testing For Protecting Data Pipelines from Undocumented Assumptions
Automated Testing For Protecting Data Pipelines from Undocumented AssumptionsDatabricks
 
TiDB at PayPay
TiDB at PayPayTiDB at PayPay
TiDB at PayPayPingCAP
 
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...Flink Forward
 
Aggregating API Services with an API Gateway (BFF)
Aggregating API Services with an API Gateway (BFF)Aggregating API Services with an API Gateway (BFF)
Aggregating API Services with an API Gateway (BFF)José Roberto Araújo
 
The Etsy Shard Architecture: Starts With S and Ends With Hard
The Etsy Shard Architecture: Starts With S and Ends With HardThe Etsy Shard Architecture: Starts With S and Ends With Hard
The Etsy Shard Architecture: Starts With S and Ends With Hardjgoulah
 
How to Synchronize Excel with SharePoint Online
How to Synchronize Excel with SharePoint OnlineHow to Synchronize Excel with SharePoint Online
How to Synchronize Excel with SharePoint OnlineDon E. Wallace
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
An Introduction to Chaos Engineering
An Introduction to Chaos EngineeringAn Introduction to Chaos Engineering
An Introduction to Chaos EngineeringGremlin
 
Technical Debt
Technical DebtTechnical Debt
Technical DebtGary Short
 
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018Amazon Web Services
 

What's hot (20)

Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
 
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAnnouncing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
 
Security-by-Design and -Default
 Security-by-Design and -Default Security-by-Design and -Default
Security-by-Design and -Default
 
Building end-to-end IT Lifecycle Mgmt & Workflows with AWS Service Catalog - ...
Building end-to-end IT Lifecycle Mgmt & Workflows with AWS Service Catalog - ...Building end-to-end IT Lifecycle Mgmt & Workflows with AWS Service Catalog - ...
Building end-to-end IT Lifecycle Mgmt & Workflows with AWS Service Catalog - ...
 
DevSecOps : de la théorie à la pratique
DevSecOps : de la théorie à la pratiqueDevSecOps : de la théorie à la pratique
DevSecOps : de la théorie à la pratique
 
[Gaming on AWS] AWS와 함께 한 쿠키런 서버 Re-architecting 사례 - 데브시스터즈
[Gaming on AWS] AWS와 함께 한 쿠키런 서버 Re-architecting 사례 - 데브시스터즈[Gaming on AWS] AWS와 함께 한 쿠키런 서버 Re-architecting 사례 - 데브시스터즈
[Gaming on AWS] AWS와 함께 한 쿠키런 서버 Re-architecting 사례 - 데브시스터즈
 
Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...
Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...
Amazon kinesis와 elasticsearch service로 만드는 실시간 데이터 분석 플랫폼 :: 박철수 :: AWS Summi...
 
Performance Modeling of Serverless Computing Platforms - CASCON2020 Workshop ...
Performance Modeling of Serverless Computing Platforms - CASCON2020 Workshop ...Performance Modeling of Serverless Computing Platforms - CASCON2020 Workshop ...
Performance Modeling of Serverless Computing Platforms - CASCON2020 Workshop ...
 
Introduction to Chaos Engineering
Introduction to Chaos EngineeringIntroduction to Chaos Engineering
Introduction to Chaos Engineering
 
Automated Testing For Protecting Data Pipelines from Undocumented Assumptions
Automated Testing For Protecting Data Pipelines from Undocumented AssumptionsAutomated Testing For Protecting Data Pipelines from Undocumented Assumptions
Automated Testing For Protecting Data Pipelines from Undocumented Assumptions
 
TiDB at PayPay
TiDB at PayPayTiDB at PayPay
TiDB at PayPay
 
Introducción a Amazon EKS
Introducción a Amazon EKSIntroducción a Amazon EKS
Introducción a Amazon EKS
 
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
 
Aggregating API Services with an API Gateway (BFF)
Aggregating API Services with an API Gateway (BFF)Aggregating API Services with an API Gateway (BFF)
Aggregating API Services with an API Gateway (BFF)
 
The Etsy Shard Architecture: Starts With S and Ends With Hard
The Etsy Shard Architecture: Starts With S and Ends With HardThe Etsy Shard Architecture: Starts With S and Ends With Hard
The Etsy Shard Architecture: Starts With S and Ends With Hard
 
How to Synchronize Excel with SharePoint Online
How to Synchronize Excel with SharePoint OnlineHow to Synchronize Excel with SharePoint Online
How to Synchronize Excel with SharePoint Online
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
An Introduction to Chaos Engineering
An Introduction to Chaos EngineeringAn Introduction to Chaos Engineering
An Introduction to Chaos Engineering
 
Technical Debt
Technical DebtTechnical Debt
Technical Debt
 
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018
 

Similar to Scaling AutoML-Driven Anomaly Detection With Luminaire

Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmeaIntroduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmeaSandesh Rao
 
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEAIntroduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEASandesh Rao
 
Introduction to Machine Learning and Data Science using the Autonomous databa...
Introduction to Machine Learning and Data Science using the Autonomous databa...Introduction to Machine Learning and Data Science using the Autonomous databa...
Introduction to Machine Learning and Data Science using the Autonomous databa...Sandesh Rao
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon Web Services
 
Machine Learning in Autonomous Data Warehouse
 Machine Learning in Autonomous Data Warehouse Machine Learning in Autonomous Data Warehouse
Machine Learning in Autonomous Data WarehouseSandesh Rao
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionSplunk
 
Introduction to Machine Learning and Data Science using Autonomous Database ...
Introduction to Machine Learning and Data Science using Autonomous Database  ...Introduction to Machine Learning and Data Science using Autonomous Database  ...
Introduction to Machine Learning and Data Science using Autonomous Database ...Sandesh Rao
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptxgdgsurrey
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionSplunk
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionSplunk
 
Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Turi, Inc.
 
Emvigo Data Visualization - E Commerce Deck
Emvigo Data Visualization - E Commerce DeckEmvigo Data Visualization - E Commerce Deck
Emvigo Data Visualization - E Commerce DeckEmvigo Technologies
 
Google machine learning engineer exam dumps 2022
Google machine learning engineer exam dumps 2022Google machine learning engineer exam dumps 2022
Google machine learning engineer exam dumps 2022SkillCertProExams
 
ITAM Tools Day, November 2015 - Concorde
ITAM Tools Day, November 2015 - ConcordeITAM Tools Day, November 2015 - Concorde
ITAM Tools Day, November 2015 - ConcordeMartin Thompson
 
Using Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemUsing Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemVMware Tanzu
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionSplunk
 

Similar to Scaling AutoML-Driven Anomaly Detection With Luminaire (20)

Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmeaIntroduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
 
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEAIntroduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
 
Introduction to Machine Learning and Data Science using the Autonomous databa...
Introduction to Machine Learning and Data Science using the Autonomous databa...Introduction to Machine Learning and Data Science using the Autonomous databa...
Introduction to Machine Learning and Data Science using the Autonomous databa...
 
AI at Scale in Enterprises
AI at Scale in Enterprises AI at Scale in Enterprises
AI at Scale in Enterprises
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
 
Machine Learning in Autonomous Data Warehouse
 Machine Learning in Autonomous Data Warehouse Machine Learning in Autonomous Data Warehouse
Machine Learning in Autonomous Data Warehouse
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
Introduction to Machine Learning and Data Science using Autonomous Database ...
Introduction to Machine Learning and Data Science using Autonomous Database  ...Introduction to Machine Learning and Data Science using Autonomous Database  ...
Introduction to Machine Learning and Data Science using Autonomous Database ...
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)
 
Emvigo Data Visualization - E Commerce Deck
Emvigo Data Visualization - E Commerce DeckEmvigo Data Visualization - E Commerce Deck
Emvigo Data Visualization - E Commerce Deck
 
MLOps.pptx
MLOps.pptxMLOps.pptx
MLOps.pptx
 
Google machine learning engineer exam dumps 2022
Google machine learning engineer exam dumps 2022Google machine learning engineer exam dumps 2022
Google machine learning engineer exam dumps 2022
 
ITAM Tools Day, November 2015 - Concorde
ITAM Tools Day, November 2015 - ConcordeITAM Tools Day, November 2015 - Concorde
ITAM Tools Day, November 2015 - Concorde
 
Using Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemUsing Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation System
 
Ml ops on AWS
Ml ops on AWSMl ops on AWS
Ml ops on AWS
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 

Recently uploaded (20)

Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 

Scaling AutoML-Driven Anomaly Detection With Luminaire

  • 1. 1 Sayan Chakraborty Smit Shah Scaling AutoML-driven Anomaly Detection with Luminaire
  • 2. Who We Are Data Governance Platform Team @ Zillow Sayan Chakraborty Senior Applied Scientist Smit Shah Senior Software Development Engineer, Big Data
  • 3. Agenda ● What is Zillow? ● Why Monitor Data Quality ● Data Quality Challenges ● Luminaire and Scaling ● Key Takeaways
  • 5. About Zillow ● Reimagining real estate to make it easier to unlock life’s next chapter * As of Q4-2020 ● Offer customers an on-demand experience for selling, buying, renting and financing with transparency and nearly seamless end-to-end service ● Most-visited real estate website in the United States
  • 7. Why Monitor Data Quality? ● Data fuels many customer facing and internal services at Zillow that rely on high quality data ○ Zestimate ○ Zillow Offers ○ Zillow Premier Agent ○ Econ and many more ● Reliable performance of ML and Services requires certain level of data quality
  • 8. Why detect Anomalies? Anomaly A data instance or behavior significantly different from the ‘regular’ patterns Complex Time-sensitive Inevitable Catching anomalies in important metric helps keep our business healthy
  • 9. Ways to Monitor Data Quality Rule Based ● Domain experts sets pre-specified rules or thresholds ○ Example: Percent of null data should be less than 2% per day for a given metric ● Less complicated to set up and easy to interpret ● Works well when the properties of data are simple and remain stationary over time ML Based ● Rules are set through mathematical modeling ● Works well when properties of data are complex and changes over time ● A more hands-off approach
  • 11. Data Quality is Context Dependent ● Depends on the use case ● Depends on the reference time frame under consideration ○ Example: Different interpretation of the same fluctuation can be obtained when compared under shorter vs longer reference time-frames ● Depends on externalities such as holidays, product launches, market specific etc
  • 12. Challenges ● Modeling ○ Wide ranges of time series patterns from different data sources - one model doesn’t fit all ○ Definition of anomalies changes at different levels of aggregation of the same data ● Scaling and Standardization ○ Everyone (Analyst, PM, DE) should be able to use ML for anomaly detection and get trustworthy data (but everyone is not an ML expert) ○ Require Scalability for handling large amount of data across teams
  • 13. Wishlist for the system ● Able to catch any data irregularities ● Scale for large amount of data and metrics ● Minimal configuration ● Minimal maintenance over time No existing solution meets the above requirements
  • 15. Luminaire Python Package Integrated with Different Models AutoML Built-in Proven to Outperform Many Existing Methods Time series Data Profiling Enabled Built for Batch and Streaming use cases Key Features Github: https://github.com/zillow/luminaire Tutorials: https://zillow.github.io/luminaire/ Scientific Paper (IEEE BigData 2020): Building an Automated and Self-Aware Anomaly Detection System (arxiv link)
  • 16. Luminaire Components AutoML Training Components Data Profiling / Preprocessing Batch Data Modeling Streaming Data Modeling Scoring/Alerting Scoring Components Pull Batch Model Pull Streaming Model
  • 17. Data Profiling / Preprocessing AutoML Training Components Data Profiling / Preprocessing Batch Data Modeling Streaming Data Modeling >>> from luminaire.exploration.data_exploration import DataExploration >>> de_obj = DataExploration(freq='D', data_shift_truncate=False, is_log_transformed=True, fill_rate=0.9) >>> data, pre_prc = de_obj.profile(data) >>> print(pre_prc) {'success': True, 'trend_change_list': ['2020-04-01 00:00:00'], 'change_point_list': ['2020-03-16 00:00:00'], 'is_log_transformed': 1, 'min_ts_mean': None, 'ts_start': '2020- 01-01 00:00:00', 'ts_end': '2020-06-07 00:00:00'}
  • 18. Training - Batch AutoML Training Components Data Profiling / Preprocessing Batch Data Modeling Streaming Data Modeling >>> from luminaire.model.lad_structural import LADStructuralModel >>> hyper_params = {"include_holidays_exog": True, "is_log_transformed": False, "max_ft_freq": 5, "p": 3, "q": 3} >>> lad_struct_obj = LADStructuralModel(hyper_params=hyper_params, freq='D') >>> success, model_date, model = lad_struct_obj.train(data=data, **pre_prc) >>> print(success, model_date, model) (True, '2020-06-07 00:00:00', <luminaire_models.model.lad_structural.LADStructuralModel object at 0x7f97e127d320>)
  • 19. Training - Streaming AutoML Training Components Data Profiling / Preprocessing Batch Data Modeling Streaming Data Modeling >>> from luminaire.model.window_density import WindowDensityHyperParams, WindowDensityModel >>> from luminaire.exploration.data_exploration import DataExploration >>> config = WindowDensityHyperParams().params >>> de_obj = DataExploration(**config) >>> data, pre_prc = de_obj.stream_profile(df=data) >>> config.update(pre_prc) >>> wdm_obj = WindowDensityModel(hyper_params=config) >>> success, training_end, model = wdm_obj.train(data=data) >>> print(success, training_end, model) True 2020-07-03 00:00:00 <luminaire.model.window_density.WindowDensityModel object at 0x7fb6fab80b00>
  • 20. AutoML - Configuration Optimization AutoML Training Components Data Profiling / Preprocessing Batch Data Modeling Streaming Data Modeling >>> from luminaire.optimization.hyperparameter_optimization import HyperparameterOptimization >>> hopt_obj = HyperparameterOptimization(freq='D') >>> opt_config = hopt_obj.run(data=data) >>> print(opt_config) {'LuminaireModel': 'LADStructuralModel', 'data_shift_truncate': 0, 'fill_rate': 0.742353444620679, 'include_holidays_exog': 1, 'is_log_transformed': 1, 'max_ft_freq': 2, 'p': 1, 'q': 1} >>> model_class_name = opt_config['LuminaireModel'] >>> module = __import__('luminaire.model', fromlist=['']) >>> model_class = getattr(module, model_class_name) >>> model_object = model_class(hyper_params=opt_config, freq='D') >>> success, model_date, trained_model = model_object.train(data=training_data, **pre_prc) >>> print(success, model_date, model) (True, '2020-06-07 00:00:00', <luminaire_models.model.lad_structural.LADStructuralModel object at 0x7fe2b47a7978>)
  • 21. Scoring - Batch Scoring/Alerting Scoring Components Pull Batch Model Pull Streaming Model >>> model.score(2000, '2020-06-08') {'Success': True, 'IsLogTransformed': 1, 'LogTransformedAdjustedActual': 7.601402334583733, 'LogTransformedPrediction': 7.85697078664991, 'LogTransformedStdErr': 0.05909378128162875, 'LogTransformedCILower': 7.759770166178546, 'LogTransformedCIUpper': 7.954171407121274, 'AdjustedActual': 2000.000000000015, 'Prediction': 1913.333800801316, 'StdErr': 111.1165409184448, 'CILower': 1722.81265596681, 'CIUpper': 2093.854945635823, 'ConfLevel': 90.0, 'ExogenousHolidays': 0, 'IsAnomaly': False, 'IsAnomalyExtreme': False, 'AnomalyProbability': 0.9616869199903785, 'DownAnomalyProbability': 0.21915654000481077, 'UpAnomalyProbability': 0.7808434599951892, 'ModelFreshness': 0.1}
  • 22. Scoring - Streaming Scoring/Alerting Scoring Components Pull Batch Model Pull Streaming Model >>> freq = model._params['freq'] >>> de_obj = DataExploration(freq=freq) >>> processed_data, pre_prc = de_obj.stream_profile(df=data, impute_only=True, impute_zero=True) >>> score, scored_window = model.score(processed_data) >>> print(score) {'Success': True, 'ConfLevel': 99.9, 'IsAnomaly': True, 'AnomalyProbability': 1.0}
  • 24. Scaling - Distributed Training/Scoring Reference: https://www.zillow.com/tech/anomaly-detection-at-zillow-using-luminaire/
  • 25. Scaling - Distributed Training/Scoring Reference: https://www.zillow.com/tech/anomaly-detection-at-zillow-using-luminaire/
  • 26. Scaling - Distributed Training/Scoring Reference: https://www.zillow.com/tech/anomaly-detection-at-zillow-using-luminaire/
  • 27. Scaling - Distributed Training/Scoring Reference: https://www.zillow.com/tech/anomaly-detection-at-zillow-using-luminaire/
  • 28. Scaling - Distributed Training/Scoring Reference: https://www.zillow.com/tech/anomaly-detection-at-zillow-using-luminaire/
  • 29. Scaling - Distributed using Spark metrics time_series [data-date, observed-value] run_date met_1 [[2021-01-01, 125], [2021-01- 02, 135], [2021-01-03, 140], ...] 2021-02-01 00:00:00 met_2 [[2021-01-01, 0.17], [2021-01- 02, 0.19], [2021-01-03, 0.22], ...] 2021-02-01 00:00:00 UDF (Train) metrics time_series [data-date, observed-value] run_date model_object met_1 [[2021-01-01, 125], [2021-01- 02, 135], [2021-01-03, 140], ...] 2021-02-01 00:00:00 <object_met_1> met_2 [[2021-01-01, 0.17], [2021-01- 02, 0.19], [2021-01-03, 0.22], ...] 2021-02-01 00:00:00 <object_met_2> UDF (Score) metrics time_series [data-date, observed-value] run_date model_object met_1 [[2021-04-01, 115], [2021-04- 02, 113]] 2021-04-02 00:00:00 <object_met_1> met_2 [[2021-04-01, 0.45], [2021-04- 02, 0.36]] 2021-04-02 00:00:00 <object_met_2> metrics time_series [data-date, observed-value] run_date score_results met_1 [[2021-04-01, 115], [2021-04- 02, 113]] 2021-04-02 00:00:00 [{“success”: True, “'AnomalyProbab ility': 0.85, ..}, ..] met_2 [[2021-04-01, 0.45], [2021-04- 02, 0.36]] 2021-04-02 00:00:00 [{“success”: True, “'AnomalyProbab ility': 0.995, ..}, ..] * These values are simulated
  • 30. Our Integrations with Central Data Systems ● Self-service UI for easier on-boarding ● Surfacing health metrics of the data source with central data catalog ● Tagging producers and consumers of the anomaly detection jobs ● Smart Alerting based on scoring output sensitivity
  • 31. Future Direction ● Support anomaly detection beyond temporal context ● Build decision systems for ML pipelines using Luminaire ● Root Cause Analysis to go a step ahead from detection to diagnosis ● User feedback to get labeled anomalies
  • 33. Key Takeaways ● Luminaire is a python library which supports anomaly detection for wide variety of time series patterns and use cases ● Proposed a technique to build a fully automated anomaly detection system that scales to big data use cases and requires minimal maintenance