Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Podling Hivemall in the Apache Incubator

514 views

Published on

Lightning Talk at #cwt2016
http://www.clouderaworldtokyo.com/

Published in: Data & Analytics
  • Be the first to comment

Podling Hivemall in the Apache Incubator

  1. 1. Podling Hivemall in the Apache Incubator Research Engineer Makoto YUI @myui <myui@treasure-data.com> 12016/11/08 Apache Hadoop Meetup at CWT 2016
  2. 2. 2016/11/08 Apache Hadoop Meetup at CWT 2016 2 Hivemall entered Apache Incubator on Sept 13, 2016 🎉 hivemall.incubator.apache.org @ApacheHivemall
  3. 3. • Makoto Yui <Treasure Data> • Takeshi Yamamuro <NTT> Ø Hivemall on Apache Spark • Daniel Dai <Hortonworks> Ø Hivemall on Apache Pig Ø Apache Pig PMC member • Tsuyoshi Ozawa <NTT> ØApache Hadoop PMC member • Kai Sasaki <Treasure Data> 3 Initial committers 2016/11/08 Apache Hadoop Meetup at CWT 2016
  4. 4. Champion Nominated Mentors 4 Project mentors • Reynold Xin <Databricks, ASF member> Apache Spark PMC member • Markus Weimer <Microsoft, ASF member> Apache REEF PMC member • Xiangrui Meng <Databricks, ASF member> Apache Spark PMC member • Roman Shaposhnik <Pivotal, ASF member> Apache Bigtop/Incubator PMC member 2016/11/08 Apache Hadoop Meetup at CWT 2016
  5. 5. What is Apache Hivemall Scalable machine learning library built as a collection of Hive UDFs 52016/11/08 Apache Hadoop Meetup at CWT 2016 Multi/Cross platform Versatile Scalable Ease-of-use
  6. 6. Hivemall is easy and scalable … Classification with Mahout CREATE TABLE lr_model AS SELECT feature, -- reducers perform model averaging in parallel avg(weight) as weight FROM ( SELECT logress(features,label,..) as (feature,weight) FROM train ) t -- map-only task GROUP BY feature; -- shuffled to reducers ML made easy for SQL developers Born to be parallel and scalable This SQL query automatically runs in parallel on Hadoop cluster 62016/11/08 Apache Hadoop Meetup at CWT 2016 Ease-of-use Scalable
  7. 7. 2016/11/08 Apache Hadoop Meetup at CWT 2016 7 Hivemall is a multi/cross-platform ML library HiveQL SparkSQL/Dataframe API Pig Latin Hivemall is Multi/Cross platform .. Multi/Cross platform prediction models built by Hive can be used from Spark, and conversely, prediction models build by Spark can be used from Hive
  8. 8. 2016/11/08 Apache Hadoop Meetup at CWT 2016 8 Hivemall on Apache Hive
  9. 9. 2016/11/08 Apache Hadoop Meetup at CWT 2016 9 Hivemall on Apache Spark Dataframe
  10. 10. 2016/11/08 Apache Hadoop Meetup at CWT 2016 10 Hivemall on SparkSQL
  11. 11. 2016/11/08 Apache Hadoop Meetup at CWT 2016 11 Hivemall on Apache Pig
  12. 12. 2016/11/08 Apache Hadoop Meetup at CWT 2016 12 Versatile Hivemall is a Versatile library .. ü Hivemall is not only for Machine Learning ü Hivemall provides bunch of generic utility functions (e.g., top-k, NLP) Each organization has own sets of UDFs for data preprocessing! Don’t Repeat Yourself! Don’t Repeat Yourself!
  13. 13. Conclusion and Takeaway Hivemall is a machine learning library that is … 2016/11/08 Apache Hadoop Meetup at CWT 2016 13 We welcome your contributions to Apache Hivemall J Multi/Cross platform Versatile Scalable Ease-of-use hivemall.incubator.apache.org

×