Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

SAP HANA SPS08 Predictive Analysis Library

417
views

Published on

SAP HANA SPS 08 - What’s New? Predictive Analysis Library

SAP HANA SPS 08 - What’s New? Predictive Analysis Library

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
417
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
47
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. SAP HANA SPS 08 - What’s New? Predictive Analysis Library SAP HANA Product Management May, 2014 (Delta from SPS 07 to SPS 08)
  • 2. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 2Public Agenda Release Theme List of Algorithms New Algorithms • Distribution Fit • Cumulative Distribution Function • Quantile Function • Random Distribution Sampling • ARIMA • FP-Growth • CART • K-Medoid Clustering Enhancements Documentation
  • 3. Release Theme
  • 4. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 4Public HANA Predictive Analysis Library – What’s New in SPS 08? Release Theme The SPS 08 version of the predictive Analysis Library includes many new algorithms as well as several enhancements to existing algorithms. These new features were chosen based on the prioritization of customer and other stakeholder requests.
  • 5. List of Algorithms
  • 6. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 6Public SAP HANA In-Memory Predictive Analytics Predictive Analysis Library (PAL) - Algorithms Supported Association Analysis  Apriori  Apriori Lite  FP-Growth * Classification Analysis  CART *  C4.5 Decision Tree Analysis  CHAID Decision Tree Analysis  K Nearest Neighbour  Logistic Regression  Naïve Bayes  Support Vector Machine Regression  Multiple Linear Regression  Polynomial Regression  Exponential Regression  Bi-Variate Geometric Regression  Bi-Variate Logarithmic Regression Outlier Detection  Inter-Quartile Range Test (Tukey’s Test)  Variance Test  Anomaly Detection Statistic Functions (Univariate)  Mean, Median, Variance, Standard Deviation  Kurtosis  Skewness Link Prediction  Common Neighbors  Jaccard’s Coefficient  Adamic/Adar  Katzβ * New in SPS 08 Data Preparation  Sampling  Random Distribution Sampling *  Binning  Scaling  Partitioning Statistic Functions (Multivariate)  Covariance Matrix  Pearson Correlations Matrix  Chi-squared Tests: - Test of Quality of Fit - Test of Independence  F-test (variance equal test) Other  Weighted Scores Table  Substitute Missing Values Cluster Analysis  ABC Classification  DBSCAN  K-Means  K-Medoid Clustering *  Kohonen Self Organized Maps  Agglomerate Hierarchical  Affinity Propagation Time Series Analysis  Single Exponential Smoothing  Double Exponential Smoothing  Triple Exponential Smoothing  Forecast Smoothing  ARIMA * Probability Distribution  Distribution Fit *  Cumulative Distribution Function *  Quantile Function *
  • 7. New Algorithms
  • 8. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 8Public HANA Predictive Analysis Library – What’s New in SPS 08? Distribution Fit Distribution fits aim to fit a probability distribution for a variable according to a series measurements to this variable. In PAL, users need to choose one probability distribution type from a supporting list (Normal, Gamma, Weibull, and Uniform) and then PAL will calculate the optimized parameters of this probability distribution which fits the observed variable best. There are two distribution fitting interfaces: DISTRFIT and DISTRFITCENSORED. DISTRFIT fits un- censored data while DISTRFITCENSORED fits censored data. Two methods are provided for finding the optimized parameters, Maximum-Likelihood and Median- Rank. In SPS 08, Maximum-Likelihood method supports all distribution types in supporting list for un- censored data. Median-Rank method supports Weibull distribution fitting for both censored and un- censored data.
  • 9. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 9Public HANA Predictive Analysis Library – What’s New in SPS 08? Cumulative Distribution Function The cumulative distribution function in PAL evaluates the probability of a variable x from the cumulative distribution function (CDF) or complementary cumulative distribution function (CCDF) for a given probability distribution.
  • 10. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 10Public HANA Predictive Analysis Library – What’s New in SPS 08? Quantile Function In PAL, quantile function evaluates the inverse F^(-1) (x) of cumulative distribution function (CDF) or the inverse F ̅^(-1) (x) of complementary cumulative distribution function (CCDF) for a given probability p and probability distribution.
  • 11. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 11Public HANA Predictive Analysis Library – What’s New in SPS 08? Random Distribution Sampling Random generation function with a given distribution (Normal, Gamma, Weibull, and Uniform).
  • 12. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 12Public HANA Predictive Analysis Library – What’s New in SPS 08? ARIMA Autoregressive integrated moving average (ARIMA) algorithm is famous in econometrics, statistics and time series analysis. An ARIMA model can be written as ARIMA (p, d, q), where p refers to the auto regressive order, d refers to integrated order and q refers to the moving average order. It can help understand the time series data better and predict future data in the series. Both training and forecast functions are provided.
  • 13. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 13Public HANA Predictive Analysis Library – What’s New in SPS 08? FP-Growth FP-Growth is an algorithm to find frequent patterns from transactions without generating a candidate itemset. In PAL, FP-Growth algorithm is extended to find association rules. In the first step, the algorithm converts the transactions into a compressed frequent pattern tree (FP-Tree). In the second step, the algorithm recursively find frequent patterns from the FP-Tree. In the last step, the PAL generates association rules based on the frequent patterns that found in the second step.
  • 14. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 14Public HANA Predictive Analysis Library – What’s New in SPS 08? CART Classification And Regression Tree (CART) is invented by Breiman et al. (1984). It only supports binary split, and it can be used for classification or regression. CART is similar with C4.5, and it is a recursive partitioning method. It uses GINI index or TWOING for classification, and least square error for regression. Surrogate split method is used to support missing values when creating the tree model
  • 15. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 15Public HANA Predictive Analysis Library – What’s New in SPS 08? K-Medoid Clustering K-Medoid algorithm is a clustering algorithm related to the K-Means algorithm. Both K-Medoids and K- Means algorithms partition n observations into k clusters in which each observation is assigned to the cluster with the closest center. In contrast to K-Means algorithm, K-Medoids algorithm doesn’t calculate means, but medoids to be the new cluster centers. A medoid is defined as the center of a cluster, whose average dissimilarity to all the objects in the cluster is minimal. Compared to K-Means algorithm, it is said to be more robust to noise and outliers.
  • 16. Enhancements
  • 17. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 17Public HANA Predictive Analysis Library – What’s New in SPS 08? Enhancements (1 of 2) Logistic regression • Support cancellation at runtime. • Support multi-nomial classification. In many business scenarios we want to train a classifier with more than two classes. Multi-class logistic regression (also referred to as multi-nomial logistic regression) extends binary logistic regression algorithm (two classes) to multi-class cases. The input and output of multi-class logistic regression are similar to that of logistic regression. K-Means Determine best k given a range according to the slight Silhouette. Apriori • Add prefix tree implementation for potential performance improvement with regards to memory consumption and time cost. • Add rule filter to define some items only allowed in the left-/right-hand side of the association rules
  • 18. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 18Public HANA Predictive Analysis Library – What’s New in SPS 08? Enhancements (2 of 2) Forecast Smoothing Auto-detect the best model among single/double/triple models Hierarchical clustering Support categorical attribute as input feature Univariate statistics • Support population variance and standard deviation • Calculate lower/upper quartile for the data Decision tree Treat missing values as a separate class, not only to replace the NULL values
  • 19. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 19Public Disclaimer This presentation outlines our general product direction and should not be relied on in making a purchase decision. This presentation is not subject to your license agreement or any other agreement with SAP. SAP has no obligation to pursue any course of business outlined in this presentation or to develop or release any functionality mentioned in this presentation. This presentation and SAP’s strategy and possible future developments are subject to change and may be changed by SAP at any time for any reason without notice. This document is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. SAP assumes no responsibility for errors or omissions in this document, except if such damages were caused by SAP intentionally or grossly negligent.
  • 20. Documentation
  • 21. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 21Public Important Note The SAP Note 2022080 has been created for missing EXECUTION privilege to call AFL_WRAPPER_GENERATOR/ERASER during HANA SPS 08 upgrade.
  • 22. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 22Public How to find SAP HANA documentation on this topic? SAP HANA Platform SPS  What’s New – Release Notes  Installation – SAP HANA Server InstallationGuide  Administration – SAP HANA Administration Guide  Development – SAP HANA Predictive Analysis Library (PAL) Reference – SAP HANA Developer Guide  References – SAP HANA SQL Reference • In addition to this learning material, you find SAP HANA documentation on SAP Help Portal knowledge center at http://help.sap.com/hana_platform. • The knowledge center is structured according to the product lifecycle: installation, security, administration, development. So you can find e.g. the SAP HANA Predictive Analysis Library (PAL) Reference in the Development section and so forth …
  • 23. © 2014 SAP AG or an SAP affiliate company. All rights reserved. Thank you Contact information Mark Hourani SAP HANA Product Management AskSAPHANA@sap.com To get the best overview of what’s new in SAP HANA SPS 08, read this blog.
  • 24. © 2014 SAP AG or an SAP affiliate company. All rights reserved. 24Public © 2014 SAP AG or an SAP affiliate company. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG or an SAP affiliate company. SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG (or an SAP affiliate company) in Germany and other countries. Please see http://global12.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices. Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors. National product specifications may vary. These materials are provided by SAP AG or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP AG or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP AG or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. In particular, SAP AG or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation, and SAP AG’s or its affiliated companies’ strategy and possible future developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP AG or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward- looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.