Successfully reported this slideshow.
Your SlideShare is downloading. ×

Introduction to Apache Hivemall v0.5.2 and v0.6

More Related Content

Introduction to Apache Hivemall v0.5.2 and v0.6

  1. 1. Introduction to Apache Hivemall v0.5.2 and v0.6 Principal Engineer Makoto YUI @myui @ApacheHivemall 1Hadoop Conf Japan - Mar 14, 2019
  2. 2. Hadoop Conf Japan - Mar 14, 2019 2 We Open-source! Streaming log collector Bulk data import/export Efficient binary serialization Machine learning on Hadoop Workflow EngineEmbedded version of Fluentd
  3. 3. Machine Learning in SQL queries 3 Hadoop Conf Japan - Mar 14, 2019
  4. 4. BigQuery ML at Google I/O 2018 4 https://ai.googleblog.com/2018/07/machine-learning-in-google-bigquery.html Hadoop Conf Japan - Mar 14, 2019
  5. 5. 5 Could I use ML-in-SQL in my cluster? Hadoop Conf Japan - Mar 14, 2019
  6. 6. 6 Open-source Machine Learning Solution for SQL-on-Hadoop Hadoop Conf Japan - Mar 14, 2019 hivemall.apache.org (incubating)
  7. 7. 7 HiveQL SparkSQL/Dataframe API Pig Latin Hivemall is a multi/cross platform ML library that provides rich set of functions Hadoop Conf Japan - Mar 14, 2019
  8. 8. Hivemall on Apache Hive 8Hadoop Conf Japan - Mar 14, 2019
  9. 9. Hivemall on Apache Spark Dataframe 9Hadoop Conf Japan - Mar 14, 2019
  10. 10. Hivemall on SparkSQL 10Hadoop Conf Japan - Mar 14, 2019
  11. 11. Hivemall on Apache Pig 11Hadoop Conf Japan - Mar 14, 2019
  12. 12. Online Prediction by Apache Streaming 12Hadoop Conf Japan - Mar 14, 2019
  13. 13. New in v0.5.2 – Brickhouse UDFs Hadoop Conf Japan - Mar 14, 2019 13 JSON Hyper LogLog
  14. 14. New in v0.5.2 – Field-aware Factorization Machines Hadoop Conf Japan - Mar 14, 2019 14
  15. 15. Hadoop Conf Japan - Mar 14, 2019 15 New in v0.5.2 – Okapi BM25 term weighting
  16. 16. Plan for v0.6 16Hadoop Conf Japan - Mar 14, 2019 Release in April-May, 2019 ü New state-of-the-art optimizers like AdamHD (merged) ü Gradient boosting ü Stable XGBoost support ü More efficient Sparse vector support in RandomForest ü Spark 2.4 support
  17. 17. 17 SELECT train_xgboost_classifier(features, label) as (model_id, model) FROM training_data XGBoost support in Hivemall (beta version) SELECT rowed, AVG(predicted) as predicted FROM ( -- predict with each model SELECT xgboost_predict(rowid, features, model_id, model) AS (rowid, predicted) -- join each test record with each model FROM xgboost_models CROSS JOIN test_data_with_id ) t GROUP BY rowid; Hadoop Conf Japan - Mar 14, 2019
  18. 18. ü Word2Vec support ü Multi-class Logistic Regression ü Hyperparameter tuning (e.g., grid search) ü Yarn application/standalone Hivemall Future work (v0.7 or later) 18 PR#91 PR#116 Hadoop Conf Japan - Mar 14, 2019
  19. 19. Hadoop Conf Japan - Mar 14, 2019 19 We are hiring.. Engineer (Java/Scala/Ruby), Data Scientist, Sales Engineer, SRE, Support Engineer

×