This document provides an overview and summary of Apache Hivemall, which is a scalable machine learning library built as a collection of Hive UDFs (user-defined functions). Some key points:
- Hivemall allows users to perform machine learning tasks like classification, regression, recommendation and anomaly detection using SQL queries in Hive, SparkSQL or Pig Latin.
- It provides a number of popular machine learning algorithms like logistic regression, decision trees, factorization machines.
- Hivemall is multi-platform, so models built in one system can be used in another. This allows ML tasks to be parallelized across clusters.
- It has been adopted by several companies for applications like click-through prediction, user