Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache SystemML AI/ML

59 views

Published on

This presentation gives an overview of the Apache SystemML AI/ML project. It explains Apache SystemML AI/ML in terms of it's functionality, dependencies and how systemDS has been forked from it providing greater functionality.

Links for further information and connecting

http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/

https://nz.linkedin.com/pub/mike-frampton/20/630/385

https://open-source-systems.blogspot.com/

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Apache SystemML AI/ML

  1. 1. What Is Apache SystemML ? ● A machine learning system ● Designed to scale to Spark / Hadoop clusters ● Open source / Apache 2 license ● Developed in Java ● Supports R-like and Python-like languages ● Which are designed to scale into the big data range ● Automatic optimization at scale for data and cluster
  2. 2. SystemML Execution Modes ● System ML supports multiple execution modes ● Including – Standalone – Spark Batch – Spark MLContext – Hadoop Batch – Java Machine Learning Connector (JMLC)
  3. 3. SystemML Dependencies ● System DS forked from ML 1.2 ● Current dependencies – Java 8+ – Scala 2.11+ – Python 2.7/3.5+ – Hadoop 2.6+ – Spark 2.1+
  4. 4. What Is Apache SystemDS ? ● Forked from Apache SystemML 1.2 in September 2018 ● Supports linear algebra programs over matrices ● Replaces the underlying data model and compiler ● Substantially extends the supported functionalities ● Supports the whole data science lifecycle – Data integration, cleaning – Feature engineering – Model training ● Over efficient ● Local and distributed ML – Deployment, serving
  5. 5. What Is Apache SystemDS ? ● R-like languages for – The data-science life cycle stages – Differing expertise levels ● High-level scripts are compiled into hybrid execution plans – For local, in-memory CPU / GPU operations – For distributed operations on Apache Spark ● Underlying data model are DataTensors – Tensors (multi-dimensional arrays) whose first dimension – May have a heterogeneous and nested schema
  6. 6. SystemDS Algorithms ● Descriptive Statistics – Univariate Statistics – Bivariate Statistics – Stratified Bivariate Statistics ● Classification – Multinomial Logistic Regression – Support Vector Machines ● Binary-Class Support Vector Machines ● Multi-Class Support Vector Machines – Naive Bayes – Decision Trees – Random Forests
  7. 7. SystemDS Algorithms ● Clustering – K-Means Clustering ● Regression – Linear Regression – Stepwise Linear Regression – Generalized Linear Models – Stepwise Generalized Linear Regression – Regression Scoring and Prediction ● Matrix Factorization – Principal Component Analysis – Matrix Completion via Alternating Minimizations
  8. 8. SystemDS Algorithms ● Survival Analysis – Kaplan-Meier Survival Analysis – Cox Proportional Hazard Regression Model ●Factorization Machines – Factorization Machine
  9. 9. SystemDS Deep Neural Nets ● Use SystemDS to implement deep neural networks – Specifying network in Keras format / invoke with Keras2DML API – Specifying network in Caffe format / invoke with Caffe2DML API – Use DML-bodied SystemDS-NN library ● Ease training compute resource issues with – Native BLAS (Basic Linear Algebra Subprograms) – SystemDS GPU backend
  10. 10. Available Books ● See “Big Data Made Easy” – Apress Jan 2015 ● See “Mastering Apache Spark” – Packt Oct 2015 ● See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” ● Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ ● Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020
  11. 11. Connect ● Feel free to connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at – open-source-systems.blogspot.com/ ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration

×