Airbnb has a wide variety of ML problems ranging from models on traditional structured data to models built on unstructured data such as user reviews, messages and listing images. The ability to build, iterate on, and maintain healthy machine learning models is critical to Airbnb’s success. Many ML Platforms cover data collection, feature engineering, training, deploying, productionalization, and monitoring but few, if any, do all of the above seamlessly.
Bighead aims to tie together various open source and in-house projects to remove incidental complexity from ML workflows. Bighead is built on Python and Spark and can be used in modular pieces as each ML problem presents unique challenges. Through standardization of the path to production, training environments and the methods for collecting and transforming data on Spark, each model is reproducible and iterable.
This talk covers the architecture, the problems that each individual component and the overall system aims to solve, and a vision for the future of machine learning infrastructure. It’s widely adapted in Airbnb and we have variety of models running in production. We have seen the overall model development time go down from many months to days on Bighead. We plan to open source Bighead to allow the wider community to benefit from our work.