Operationalizing Machine Learning models is never easy. Our team at Comcast has been challenged with operationalizing predictive ML models to improve customer care experiences. Using Apache Flink we have been able to apply real-time streaming to all aspects of the Machine Learning lifecycle. This includes data feature exploration and preparation by data scientists, deploying live models to serve near-real-time predictions, and validating results for model retraining and iteration. We will share best practices and lessons learned from Flink’s role in our operationalized lifecycle including:
• Executing as the “Prediction Pipeline” – a model container environment for near-real-time streaming and batch predictions
• Preparing streaming features and data sets for model training, as input for production model predictions, and for a continually-updated customer context
• Using connected streams and savepoints for “Live in the Dark”, multi-variant testing, and validation scenarios
• Incorporating Flink’s Queryable State as an approach to the online “Feature Store” – a data catalog for reuse by multiple models and use cases
• Enabling versioned models, versioned feature sets, and versioned data through DevOps approaches.