This document provides an introduction and overview of Hivemall, an open source machine learning library built as a collection of Hive UDFs. It begins with background on the presenter, Makoto Yui, and then covers the following key points:
- What Hivemall is and its vision of bringing machine learning capabilities to SQL users
- Popular algorithms supported in current and upcoming versions, such as random forest, factorization machines, gradient boosted trees
- Real-world use cases at companies such as for click-through rate prediction, user profiling, and churn detection
- How to use algorithms like random forest, matrix factorization, and factorization machines from SQL queries
- The development roadmap, with upcoming features including NLP