Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow

93 views

Published on

As Atlassian continues to scale to more and more customers, the demand for our legendary support continues to grow. Atlassian needs to maintain balance between the staffing levels needed to service this increasing support ticket volume with the budgetary constraints needed to keep the business healthy – automated ticket volume forecasting is at the centre of this delicate balance

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow

  1. 1. PERRY STEPHENSON | SENIOR SOFTWARE ENGINEER | ATLASSIAN Automatic Forecasting Creating a robust, fault-tolerant, auditable and reproducible forecasting pipeline
  2. 2. Empowered End User Building pipelines with end-user tooling means
 I can Get S#!* Done™ all by myself Reproducibility Lots of people talk about it. 
 I did something about it. I ❤ Databricks It makes my life easier and makes me look good in front of colleagues and managers Why am I speaking? Perry Stephenson ATLASSIAN
  3. 3. BUSINESS CASE
  4. 4. Forecasting a special case for Machine Learning because you need to retrain every time you make a CONTEXT
  5. 5. Holdout
  6. 6. Holdout
  7. 7. End-to-End Pipeline Create a robust, fault- tolerant, auditable and reproducible forecasting pipeline Backtesting Evaluate the whole pipeline using backtesting.
 Prioritise stability Ensuring Fresh Forecasts
  8. 8. MEME TRANSLATION Update Dataset Train Models Score Models Not Production Production Update Dataset Train Models Score Models Production
  9. 9. Support Ticket Forecasting Pipeline Scheduler (monthly) Create
 Modelling Dataset Update
 Modelling Dataset Train Forecast Models Score Forecast Models 1 2 3 4
  10. 10. Delta 
 Lake MLflow Tracking Notebook Workflows Review 
 Pipeline Agenda
  11. 11. Delta Lake Reproducible Training Data
  12. 12. Support Ticket Forecasting Pipeline Scheduling Create
 Modelling Dataset Update
 Modelling Dataset Train Forecast Models Score Forecast Models 1 2 3 4
  13. 13. Support Ticket Forecasting Pipeline Create
 Modelling Dataset create table zone_myteam.my_table using delta as select * from … … … 1
  14. 14. Update
 Modelling Dataset Support Ticket Forecasting Pipeline - Runs every month to update the dataset - Merges changes and retains history by leveraging Delta Lake 2
  15. 15. MORE INFO
 ABOUT THE PIPELINE DELTA LAKE THIS TALK >>>
  16. 16. FAKE NEW S
  17. 17. DELTA LAKE GIVES YOU VERSIONED TABLES FOR REPRODUCIBLE DATA SCIENCE
  18. 18. merge into zone_myteam.my_table as existing using my_latest_data as new on existing.date_name = new.date_name and existing.platform = new.platform and existing.customer_region = new.customer_region when matched then update set * when not matched then insert *
  19. 19. Update
 Modelling Dataset Support Ticket Forecasting Pipeline - Creates a new version every time it runs - Latest version is always the most accurate - Can recover any previous version of the training dataset
  20. 20. MLflow Tracking Reproducible Forecasting Models
  21. 21. Support Ticket Forecasting Pipeline Scheduling Create
 Modelling Dataset Update
 Modelling Dataset Train Forecast Models Score Forecast Models
  22. 22. Support Ticket Forecasting Pipeline Train Forecast Models - Written in R, using Facebook Prophet - Trains a model for every ticket grouping, stores model + metadata in MLflow - Takes an argument for “forecast_date” to allow backfilling, defaults to month end
  23. 23. MLflow THIS TALK >>> MORE INFO
 ABOUT THE PIPELINE
  24. 24. LOGGING TO MLFLOW mlflow_client_static <- mlflow_client() run_info <- mlflow_start_run(experiment_id = 611628) … … … mlflow_log_param(key = “forecast_date", value = forecast_date) mlflow_log_param(key = “platform", value = platform) … … … mlflow_log_artifact(path = 'prophet_model.rds', artifact_path = ‘model’) … … … mlflow_end_run(run_id = run_info$run_uuid, client = mlflow_client_static)
  25. 25. Scheduling Create
 Modelling Dataset Update
 Modelling Dataset Train Forecast Models Score Forecast Models Delta MLflow Delta
  26. 26. Score Forecast Models Support Ticket Forecasting Pipeline - Reads from MLflow (two passes to recover params), builds an execution plan - Scores each forecast, and prepares aggregates for consumption - Appends/overwrites results in our data lake, with history maintained using Delta - Includes MLflow links for every row in the forecast table
  27. 27. READING FROM MLFLOW required_forecasts <- mlflow_list_run_infos(experiment_id=611628) for (i in 1:nrow(required_forecasts)) { run_details <- mlflow_get_run(required_forecasts$run_uuid[i]) run_params <- run_details$params[[1]] … … … # not shown: unpack params and score model # not shown: weekly/monthly/quarterly aggregations # not shown: add MLFlow URL } # not shown: union and upload all forecasts at once
  28. 28. Score Forecast Models Support Ticket Forecasting Pipeline - Uploads all results to in-memory temporary table - Deletes all records from the final table with the same forecast_date - Merges changes in to forecast output table
  29. 29. Notebook Workflows Flexible and Reliable Execution
  30. 30. NOTEBOOK WORKFLOWS THIS TALK >>> MORE INFO
 ABOUT THE PIPELINE
  31. 31. WIDGETS (AKA NOTEBOOK ARGUMENTS)
  32. 32. Review
  33. 33. Scheduling Update Modelling Dataset Train Forecast Models Score Forecast Models 15th day of the month Delta table MLflow
  34. 34. ULTIMATE PRETTY GOOD REPRODUCIBILITY Forecast Table Forecast Row MLflow URL Model Binary Table Name Table Version Training Data
  35. 35. The Databricks platform supports the chaos during early development, and provides pathways to CONCLUSION

×