SAP HANA SPS 08 - What’s New?
Predictive Analysis Library
SAP HANA Product Management May, 2014
(Delta from SPS 07 to SPS 08)
© 2014 SAP AG or an SAP affiliate company. All rights reserved. 2Public
Agenda
Release Theme
List of Algorithms
New Algorithms
• Distribution Fit
• Cumulative Distribution Function
• Quantile Function
• Random Distribution Sampling
• ARIMA
• FP-Growth
• CART
• K-Medoid Clustering
Enhancements
Documentation
Release Theme
© 2014 SAP AG or an SAP affiliate company. All rights reserved. 4Public
HANA Predictive Analysis Library – What’s New in SPS 08?
Release Theme
The SPS 08 version of the predictive Analysis Library includes many new algorithms as well as several
enhancements to existing algorithms.
These new features were chosen based on the prioritization of customer and other stakeholder
requests.
List of Algorithms
© 2014 SAP AG or an SAP affiliate company. All rights reserved. 6Public
SAP HANA In-Memory Predictive Analytics
Predictive Analysis Library (PAL) - Algorithms Supported
Association Analysis
 Apriori
 Apriori Lite
 FP-Growth *
Classification
Analysis
 CART *
 C4.5 Decision Tree Analysis
 CHAID Decision Tree Analysis
 K Nearest Neighbour
 Logistic Regression
 Naïve Bayes
 Support Vector Machine
Regression
 Multiple Linear Regression
 Polynomial Regression
 Exponential Regression
 Bi-Variate Geometric Regression
 Bi-Variate Logarithmic
Regression
Outlier Detection
 Inter-Quartile Range Test (Tukey’s
Test)
 Variance Test
 Anomaly Detection
Statistic Functions
(Univariate)
 Mean, Median, Variance,
Standard Deviation
 Kurtosis
 Skewness
Link Prediction
 Common Neighbors
 Jaccard’s Coefficient
 Adamic/Adar
 Katzβ
* New in SPS 08
Data Preparation
 Sampling
 Random Distribution Sampling *
 Binning
 Scaling
 Partitioning
Statistic Functions
(Multivariate)
 Covariance Matrix
 Pearson Correlations Matrix
 Chi-squared Tests:
- Test of Quality of Fit
- Test of Independence
 F-test (variance equal test)
Other
 Weighted Scores Table
 Substitute Missing Values
Cluster Analysis
 ABC Classification
 DBSCAN
 K-Means
 K-Medoid Clustering *
 Kohonen Self Organized Maps
 Agglomerate Hierarchical
 Affinity Propagation
Time Series Analysis
 Single Exponential Smoothing
 Double Exponential Smoothing
 Triple Exponential Smoothing
 Forecast Smoothing
 ARIMA *
Probability Distribution
 Distribution Fit *
 Cumulative Distribution Function *
 Quantile Function *
New Algorithms
© 2014 SAP AG or an SAP affiliate company. All rights reserved. 8Public
HANA Predictive Analysis Library – What’s New in SPS 08?
Distribution Fit
Distribution fits aim to fit a probability distribution for a variable according to a series measurements to
this variable.
In PAL, users need to choose one probability distribution type from a supporting list (Normal, Gamma,
Weibull, and Uniform) and then PAL will calculate the optimized parameters of this probability
distribution which fits the observed variable best.
There are two distribution fitting interfaces: DISTRFIT and DISTRFITCENSORED. DISTRFIT fits un-
censored data while DISTRFITCENSORED fits censored data.
Two methods are provided for finding the optimized parameters, Maximum-Likelihood and Median-
Rank. In SPS 08, Maximum-Likelihood method supports all distribution types in supporting list for un-
censored data. Median-Rank method supports Weibull distribution fitting for both censored and un-
censored data.
© 2014 SAP AG or an SAP affiliate company. All rights reserved. 9Public
HANA Predictive Analysis Library – What’s New in SPS 08?
Cumulative Distribution Function
The cumulative distribution function in PAL evaluates the probability of a variable x from the cumulative
distribution function (CDF) or complementary cumulative distribution function (CCDF) for a given
probability distribution.
© 2014 SAP AG or an SAP affiliate company. All rights reserved. 10Public
HANA Predictive Analysis Library – What’s New in SPS 08?
Quantile Function
In PAL, quantile function evaluates the inverse F^(-1) (x) of cumulative distribution function (CDF) or
the inverse F ̅^(-1) (x) of complementary cumulative distribution function (CCDF) for a given probability
p and probability distribution.
© 2014 SAP AG or an SAP affiliate company. All rights reserved. 11Public
HANA Predictive Analysis Library – What’s New in SPS 08?
Random Distribution Sampling
Random generation function with a given distribution (Normal, Gamma, Weibull, and Uniform).
© 2014 SAP AG or an SAP affiliate company. All rights reserved. 12Public
HANA Predictive Analysis Library – What’s New in SPS 08?
ARIMA
Autoregressive integrated moving average (ARIMA) algorithm is famous in econometrics, statistics and
time series analysis. An ARIMA model can be written as ARIMA (p, d, q), where p refers to the auto
regressive order, d refers to integrated order and q refers to the moving average order. It can help
understand the time series data better and predict future data in the series. Both training and forecast
functions are provided.
© 2014 SAP AG or an SAP affiliate company. All rights reserved. 13Public
HANA Predictive Analysis Library – What’s New in SPS 08?
FP-Growth
FP-Growth is an algorithm to find frequent patterns from transactions without generating a candidate
itemset. In PAL, FP-Growth algorithm is extended to find association rules. In the first step, the
algorithm converts the transactions into a compressed frequent pattern tree (FP-Tree). In the second
step, the algorithm recursively find frequent patterns from the FP-Tree. In the last step, the PAL
generates association rules based on the frequent patterns that found in the second step.
© 2014 SAP AG or an SAP affiliate company. All rights reserved. 14Public
HANA Predictive Analysis Library – What’s New in SPS 08?
CART
Classification And Regression Tree (CART) is invented by Breiman et al. (1984). It only supports
binary split, and it can be used for classification or regression. CART is similar with C4.5, and it is a
recursive partitioning method. It uses GINI index or TWOING for classification, and least square error
for regression. Surrogate split method is used to support missing values when creating the tree model
© 2014 SAP AG or an SAP affiliate company. All rights reserved. 15Public
HANA Predictive Analysis Library – What’s New in SPS 08?
K-Medoid Clustering
K-Medoid algorithm is a clustering algorithm related to the K-Means algorithm. Both K-Medoids and K-
Means algorithms partition n observations into k clusters in which each observation is assigned to the
cluster with the closest center. In contrast to K-Means algorithm, K-Medoids algorithm doesn’t
calculate means, but medoids to be the new cluster centers. A medoid is defined as the center of a
cluster, whose average dissimilarity to all the objects in the cluster is minimal. Compared to K-Means
algorithm, it is said to be more robust to noise and outliers.
Enhancements
© 2014 SAP AG or an SAP affiliate company. All rights reserved. 17Public
HANA Predictive Analysis Library – What’s New in SPS 08?
Enhancements (1 of 2)
Logistic regression
• Support cancellation at runtime.
• Support multi-nomial classification. In many business scenarios we want to train a classifier with
more than two classes. Multi-class logistic regression (also referred to as multi-nomial logistic
regression) extends binary logistic regression algorithm (two classes) to multi-class cases. The
input and output of multi-class logistic regression are similar to that of logistic regression.
K-Means
Determine best k given a range according to the slight Silhouette.
Apriori
• Add prefix tree implementation for potential performance improvement with regards to memory
consumption and time cost.
• Add rule filter to define some items only allowed in the left-/right-hand side of the association rules
© 2014 SAP AG or an SAP affiliate company. All rights reserved. 18Public
HANA Predictive Analysis Library – What’s New in SPS 08?
Enhancements (2 of 2)
Forecast Smoothing
Auto-detect the best model among single/double/triple models
Hierarchical clustering
Support categorical attribute as input feature
Univariate statistics
• Support population variance and standard deviation
• Calculate lower/upper quartile for the data
Decision tree
Treat missing values as a separate class, not only to replace the NULL values
© 2014 SAP AG or an SAP affiliate company. All rights reserved. 19Public
Disclaimer
This presentation outlines our general product direction and should not be relied on in making
a purchase decision. This presentation is not subject to your license agreement or any other
agreement with SAP.
SAP has no obligation to pursue any course of business outlined in this presentation or to
develop or release any functionality mentioned in this presentation. This presentation and
SAP’s strategy and possible future developments are subject to change and may be changed
by SAP at any time for any reason without notice.
This document is provided without a warranty of any kind, either express or implied, including
but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or
non-infringement. SAP assumes no responsibility for errors or omissions in this document,
except if such damages were caused by SAP intentionally or grossly negligent.
Documentation
© 2014 SAP AG or an SAP affiliate company. All rights reserved. 21Public
Important Note
The SAP Note 2022080 has been created for missing EXECUTION privilege to call
AFL_WRAPPER_GENERATOR/ERASER during HANA SPS 08 upgrade.
© 2014 SAP AG or an SAP affiliate company. All rights reserved. 22Public
How to find SAP HANA documentation on this topic?
SAP HANA Platform SPS
 What’s New – Release Notes
 Installation
– SAP HANA Server InstallationGuide
 Administration
– SAP HANA Administration Guide
 Development
– SAP HANA Predictive Analysis Library (PAL) Reference
– SAP HANA Developer Guide
 References
– SAP HANA SQL Reference
• In addition to this learning material, you find SAP HANA documentation on
SAP Help Portal knowledge center at http://help.sap.com/hana_platform.
• The knowledge center is structured according to the product lifecycle: installation, security, administration,
development. So you can find e.g. the SAP HANA Predictive Analysis Library (PAL) Reference
in the Development section and so forth …
© 2014 SAP AG or an SAP affiliate company. All rights reserved.
Thank you
Contact information
Mark Hourani
SAP HANA Product Management
AskSAPHANA@sap.com
To get the best overview of what’s new in SAP HANA SPS 08, read this blog.
© 2014 SAP AG or an SAP affiliate company. All rights reserved. 24Public
© 2014 SAP AG or an SAP affiliate company. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG or an SAP affiliate company.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG (or an SAP affiliate
company) in Germany and other countries. Please see http://global12.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.
Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.
National product specifications may vary.
These materials are provided by SAP AG or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP AG or its
affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP AG or SAP affiliate company products and services
are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an
additional warranty.
In particular, SAP AG or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or
release any functionality mentioned therein. This document, or any related presentation, and SAP AG’s or its affiliated companies’ strategy and possible future
developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP AG or its affiliated companies at any time for
any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-
looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place
undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.

SAP HANA SPS08 Predictive Analysis Library

  • 1.
    SAP HANA SPS08 - What’s New? Predictive Analysis Library SAP HANA Product Management May, 2014 (Delta from SPS 07 to SPS 08)
  • 2.
    © 2014 SAPAG or an SAP affiliate company. All rights reserved. 2Public Agenda Release Theme List of Algorithms New Algorithms • Distribution Fit • Cumulative Distribution Function • Quantile Function • Random Distribution Sampling • ARIMA • FP-Growth • CART • K-Medoid Clustering Enhancements Documentation
  • 3.
  • 4.
    © 2014 SAPAG or an SAP affiliate company. All rights reserved. 4Public HANA Predictive Analysis Library – What’s New in SPS 08? Release Theme The SPS 08 version of the predictive Analysis Library includes many new algorithms as well as several enhancements to existing algorithms. These new features were chosen based on the prioritization of customer and other stakeholder requests.
  • 5.
  • 6.
    © 2014 SAPAG or an SAP affiliate company. All rights reserved. 6Public SAP HANA In-Memory Predictive Analytics Predictive Analysis Library (PAL) - Algorithms Supported Association Analysis  Apriori  Apriori Lite  FP-Growth * Classification Analysis  CART *  C4.5 Decision Tree Analysis  CHAID Decision Tree Analysis  K Nearest Neighbour  Logistic Regression  Naïve Bayes  Support Vector Machine Regression  Multiple Linear Regression  Polynomial Regression  Exponential Regression  Bi-Variate Geometric Regression  Bi-Variate Logarithmic Regression Outlier Detection  Inter-Quartile Range Test (Tukey’s Test)  Variance Test  Anomaly Detection Statistic Functions (Univariate)  Mean, Median, Variance, Standard Deviation  Kurtosis  Skewness Link Prediction  Common Neighbors  Jaccard’s Coefficient  Adamic/Adar  Katzβ * New in SPS 08 Data Preparation  Sampling  Random Distribution Sampling *  Binning  Scaling  Partitioning Statistic Functions (Multivariate)  Covariance Matrix  Pearson Correlations Matrix  Chi-squared Tests: - Test of Quality of Fit - Test of Independence  F-test (variance equal test) Other  Weighted Scores Table  Substitute Missing Values Cluster Analysis  ABC Classification  DBSCAN  K-Means  K-Medoid Clustering *  Kohonen Self Organized Maps  Agglomerate Hierarchical  Affinity Propagation Time Series Analysis  Single Exponential Smoothing  Double Exponential Smoothing  Triple Exponential Smoothing  Forecast Smoothing  ARIMA * Probability Distribution  Distribution Fit *  Cumulative Distribution Function *  Quantile Function *
  • 7.
  • 8.
    © 2014 SAPAG or an SAP affiliate company. All rights reserved. 8Public HANA Predictive Analysis Library – What’s New in SPS 08? Distribution Fit Distribution fits aim to fit a probability distribution for a variable according to a series measurements to this variable. In PAL, users need to choose one probability distribution type from a supporting list (Normal, Gamma, Weibull, and Uniform) and then PAL will calculate the optimized parameters of this probability distribution which fits the observed variable best. There are two distribution fitting interfaces: DISTRFIT and DISTRFITCENSORED. DISTRFIT fits un- censored data while DISTRFITCENSORED fits censored data. Two methods are provided for finding the optimized parameters, Maximum-Likelihood and Median- Rank. In SPS 08, Maximum-Likelihood method supports all distribution types in supporting list for un- censored data. Median-Rank method supports Weibull distribution fitting for both censored and un- censored data.
  • 9.
    © 2014 SAPAG or an SAP affiliate company. All rights reserved. 9Public HANA Predictive Analysis Library – What’s New in SPS 08? Cumulative Distribution Function The cumulative distribution function in PAL evaluates the probability of a variable x from the cumulative distribution function (CDF) or complementary cumulative distribution function (CCDF) for a given probability distribution.
  • 10.
    © 2014 SAPAG or an SAP affiliate company. All rights reserved. 10Public HANA Predictive Analysis Library – What’s New in SPS 08? Quantile Function In PAL, quantile function evaluates the inverse F^(-1) (x) of cumulative distribution function (CDF) or the inverse F ̅^(-1) (x) of complementary cumulative distribution function (CCDF) for a given probability p and probability distribution.
  • 11.
    © 2014 SAPAG or an SAP affiliate company. All rights reserved. 11Public HANA Predictive Analysis Library – What’s New in SPS 08? Random Distribution Sampling Random generation function with a given distribution (Normal, Gamma, Weibull, and Uniform).
  • 12.
    © 2014 SAPAG or an SAP affiliate company. All rights reserved. 12Public HANA Predictive Analysis Library – What’s New in SPS 08? ARIMA Autoregressive integrated moving average (ARIMA) algorithm is famous in econometrics, statistics and time series analysis. An ARIMA model can be written as ARIMA (p, d, q), where p refers to the auto regressive order, d refers to integrated order and q refers to the moving average order. It can help understand the time series data better and predict future data in the series. Both training and forecast functions are provided.
  • 13.
    © 2014 SAPAG or an SAP affiliate company. All rights reserved. 13Public HANA Predictive Analysis Library – What’s New in SPS 08? FP-Growth FP-Growth is an algorithm to find frequent patterns from transactions without generating a candidate itemset. In PAL, FP-Growth algorithm is extended to find association rules. In the first step, the algorithm converts the transactions into a compressed frequent pattern tree (FP-Tree). In the second step, the algorithm recursively find frequent patterns from the FP-Tree. In the last step, the PAL generates association rules based on the frequent patterns that found in the second step.
  • 14.
    © 2014 SAPAG or an SAP affiliate company. All rights reserved. 14Public HANA Predictive Analysis Library – What’s New in SPS 08? CART Classification And Regression Tree (CART) is invented by Breiman et al. (1984). It only supports binary split, and it can be used for classification or regression. CART is similar with C4.5, and it is a recursive partitioning method. It uses GINI index or TWOING for classification, and least square error for regression. Surrogate split method is used to support missing values when creating the tree model
  • 15.
    © 2014 SAPAG or an SAP affiliate company. All rights reserved. 15Public HANA Predictive Analysis Library – What’s New in SPS 08? K-Medoid Clustering K-Medoid algorithm is a clustering algorithm related to the K-Means algorithm. Both K-Medoids and K- Means algorithms partition n observations into k clusters in which each observation is assigned to the cluster with the closest center. In contrast to K-Means algorithm, K-Medoids algorithm doesn’t calculate means, but medoids to be the new cluster centers. A medoid is defined as the center of a cluster, whose average dissimilarity to all the objects in the cluster is minimal. Compared to K-Means algorithm, it is said to be more robust to noise and outliers.
  • 16.
  • 17.
    © 2014 SAPAG or an SAP affiliate company. All rights reserved. 17Public HANA Predictive Analysis Library – What’s New in SPS 08? Enhancements (1 of 2) Logistic regression • Support cancellation at runtime. • Support multi-nomial classification. In many business scenarios we want to train a classifier with more than two classes. Multi-class logistic regression (also referred to as multi-nomial logistic regression) extends binary logistic regression algorithm (two classes) to multi-class cases. The input and output of multi-class logistic regression are similar to that of logistic regression. K-Means Determine best k given a range according to the slight Silhouette. Apriori • Add prefix tree implementation for potential performance improvement with regards to memory consumption and time cost. • Add rule filter to define some items only allowed in the left-/right-hand side of the association rules
  • 18.
    © 2014 SAPAG or an SAP affiliate company. All rights reserved. 18Public HANA Predictive Analysis Library – What’s New in SPS 08? Enhancements (2 of 2) Forecast Smoothing Auto-detect the best model among single/double/triple models Hierarchical clustering Support categorical attribute as input feature Univariate statistics • Support population variance and standard deviation • Calculate lower/upper quartile for the data Decision tree Treat missing values as a separate class, not only to replace the NULL values
  • 19.
    © 2014 SAPAG or an SAP affiliate company. All rights reserved. 19Public Disclaimer This presentation outlines our general product direction and should not be relied on in making a purchase decision. This presentation is not subject to your license agreement or any other agreement with SAP. SAP has no obligation to pursue any course of business outlined in this presentation or to develop or release any functionality mentioned in this presentation. This presentation and SAP’s strategy and possible future developments are subject to change and may be changed by SAP at any time for any reason without notice. This document is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. SAP assumes no responsibility for errors or omissions in this document, except if such damages were caused by SAP intentionally or grossly negligent.
  • 20.
  • 21.
    © 2014 SAPAG or an SAP affiliate company. All rights reserved. 21Public Important Note The SAP Note 2022080 has been created for missing EXECUTION privilege to call AFL_WRAPPER_GENERATOR/ERASER during HANA SPS 08 upgrade.
  • 22.
    © 2014 SAPAG or an SAP affiliate company. All rights reserved. 22Public How to find SAP HANA documentation on this topic? SAP HANA Platform SPS  What’s New – Release Notes  Installation – SAP HANA Server InstallationGuide  Administration – SAP HANA Administration Guide  Development – SAP HANA Predictive Analysis Library (PAL) Reference – SAP HANA Developer Guide  References – SAP HANA SQL Reference • In addition to this learning material, you find SAP HANA documentation on SAP Help Portal knowledge center at http://help.sap.com/hana_platform. • The knowledge center is structured according to the product lifecycle: installation, security, administration, development. So you can find e.g. the SAP HANA Predictive Analysis Library (PAL) Reference in the Development section and so forth …
  • 23.
    © 2014 SAPAG or an SAP affiliate company. All rights reserved. Thank you Contact information Mark Hourani SAP HANA Product Management AskSAPHANA@sap.com To get the best overview of what’s new in SAP HANA SPS 08, read this blog.
  • 24.
    © 2014 SAPAG or an SAP affiliate company. All rights reserved. 24Public © 2014 SAP AG or an SAP affiliate company. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG or an SAP affiliate company. SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG (or an SAP affiliate company) in Germany and other countries. Please see http://global12.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices. Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors. National product specifications may vary. These materials are provided by SAP AG or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP AG or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP AG or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. In particular, SAP AG or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation, and SAP AG’s or its affiliated companies’ strategy and possible future developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP AG or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward- looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.