Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SAP HANA SPS10- Predictive Analysis Library and Application Function Modeler

7,838 views

Published on

See what's new in SAP HANA SPS10- Predictive Analysis Library and Application Function Modeler

Published in: Technology
  • Be the first to comment

SAP HANA SPS10- Predictive Analysis Library and Application Function Modeler

  1. 1. 1© 2015 SAP SE or an SAP affiliate company. All rights reserved. SAP HANA SPS 10 – What’s New? Predictive Analysis Library & Application Function Modeler SAP HANA Product Management June, 2015 (Delta from SPS 09 to SPS 10)
  2. 2. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 2Customer Disclaimer This presentation outlines our general product direction and should not be relied on in making a purchase decision. This presentation is not subject to your license agreement or any other agreement with SAP. SAP has no obligation to pursue any course of business outlined in this presentation or to develop or release any functionality mentioned in this presentation. This presentation and SAP’s strategy and possible future developments are subject to change and may be changed by SAP at any time for any reason without notice. This document is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. SAP assumes no responsibility for errors or omissions in this document, except if such damages were caused by SAP intentionally or grossly negligent.
  3. 3. Predictive Analysis Library (PAL)
  4. 4. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 4Customer Agenda Predictive Analysis Library (PAL) Release Theme List of Algorithms General Changes New Algorithms • Confusion Matrix • Parameter Selection and Model Evaluation • Gaussian Mixture Model • Latent Dirichlet Allocation (LDA) • Test for White Noise, Trend, Seasonality • Grubbs Outlier Test • Seasonal ARIMA Enhancements Documentation
  5. 5. Release Theme
  6. 6. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 6Customer HANA Predictive Analysis Library – What’s New in SPS 10? Release Theme The SPS 10 version of the Predictive Analysis Library includes many new algorithms as well as several enhancements to existing algorithms. These new features were chosen based on the prioritization of customer and other stakeholder requests.
  7. 7. List of Algorithms
  8. 8. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 8Customer SAP HANA In-Memory Predictive Analytics Predictive Analysis Library (PAL) - Algorithms Supported Association Analysis  Apriori  Apriori Lite  FP-Growth  KORD – Top K Rule Discovery Classification Analysis  CART *  C4.5 Decision Tree Analysis  CHAID Decision Tree Analysis  K Nearest Neighbour  Logistic Regression  Back-Propagation (Neural Network)  Naïve Bayes  Support Vector Machine  Confusion Matrix*  Parameter Selection & Model Evaluation* Regression  Multiple Linear Regression  Polynomial Regression  Exponential Regression  Bi-Variate Geometric Regression  Bi-Variate Logarithmic Regression Probability Distribution  Distribution Fit  Cumulative Distribution Function  Quantile Function Outlier Detection  Inter-Quartile Range Test (Tukey’s Test)  Variance Test  Anomaly Detection  Grubbs Outlier Test* Link Prediction  Common Neighbors  Jaccard’s Coefficient  Adamic/Adar  Katzβ Data Preparation  Sampling  Random Distribution Sampling  Binning  Scaling  Partitioning  Principal Component Analysis (PCA) * New in SPS 10 Statistic Functions (Univariate)  Mean, Median, Variance, Standard Deviation  Kurtosis  Skewness Statistic Functions (Multivariate)  Covariance Matrix  Pearson Correlations Matrix  Chi-squared Tests: - Test of Quality of Fit - Test of Independence  F-test (variance equal test) Other  Weighted Scores Table  Substitute Missing Values Cluster Analysis  ABC Classification  DBSCAN  K-Means  K-Medoid Clustering  K-Medians  Kohonen Self Organized Maps  Agglomerate Hierarchical  Affinity Propagation  Gaussian Mixture Model*  Latent Dirichlet Allocation (LDA)* Time Series Analysis  Single Exponential Smoothing  Double Exponential Smoothing  Triple Exponential Smoothing  Forecast Smoothing  ARIMA / Seasonal ARIMA*  Brown Exponential Smoothing  Croston Method  Forecast Accuracy Measure  Linear Regression with Damped Trend and Seasonal Adjust  Test for White Noise, Trend, Seasonality*
  9. 9. General Changes
  10. 10. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 10Customer HANA Predictive Analysis Library – What’s New in SPS 10? General Changes Enhanced PAL exception handling for HANA SQL handler Exceptions thrown by PAL could be caught by the exception handler in a SQLScript procedure. Error code 423 (AFL error). PAL function integration with SAP HANA Series Data and window functions (partition, binning, single & double exponential smoothing) RANDOM_PARTITION (<training_set_size>, <validation_set_size>, <test_set_size>, [<seed>]) OVER ( [ PARTITION BY <expression> [ { , <expression> } ... ] ] ORDER BY <window_order_by_expression> ) BINNING (VALUE => <column ref>, <binning parameter> => <expression>) OVER ([PARTITION BY …]) <binning parameter> := BIN_COUNT | BIN_WIDTH | TILE_COUNT | STDDEV_COUNT SERIES_FILTER (VALUE, METHOD_NAME => <NAME>, <filter parameter> => <expression> OVER ( PARTITION BY … ORDER BY …) <NAME> := SINGLESMOOTH/DOUBLESMOOTH <filter parameter> := ALPHA|BETA
  11. 11. New Algorithms
  12. 12. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 12Customer HANA Predictive Analysis Library – What’s New in SPS 10? Parameter Selection and Model Evaluation Parameter Selection and Model Evaluation Enables cross-validation and parameter selection for the following PAL functions: • Logistic Regression • Naive Bayes • Support Vector Machine To avoid over fitting and optimize model parameters, it is common to use cross validation to evaluate model performance and perform model selection. This function is an envelope for different classification algorithms to provide automatic parameter selection and model evaluation facilities during the model training phase. Output Tables
  13. 13. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 13Customer HANA Predictive Analysis Library – What’s New in SPS 10? Grubbs Outlier Test - Anomaly detection Grubbs Outlier Test - Anomaly detection Grubbs’ test is used to detect outliers using hypothesis test and Grubbs’ test statistic from a given univariate data set , and the algorithm assumes that Y comes from Gaussian distribution.
  14. 14. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 14Customer HANA Predictive Analysis Library – What’s New in SPS 10? Gaussian Mixture Model (GMM) Gaussian Mixture Model GMM is a probabilistic model that assumes the data points are generated by a mixture of a number of Gaussian distributions with unknown parameters. It can be used to cluster data points with the probability of belonging to each cluster. Each component in GMM has its own weight, mean and covariance matrix. The weight means the importance of a Gaussian distribution in the GMM. The mean and covariance matrix are the basic parameters of a Gaussian distribution. As one example, GMM can be used in the field of image segmentation and clustering. In PAL, we view the GMM as a clustering algorithm. We use it to describe the data and finally we can get the probabilities of a sample belonging to each of the Gaussian components in GMM.
  15. 15. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 15Customer HANA Predictive Analysis Library – What’s New in SPS 10? Latent Dirichlet Allocation (LDA) – Topic Modeling Latent Dirichlet allocation Latent Dirichlet allocation (LDA) is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. It is often used in text mining for topic modeling. If observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. In PAL, the parameter inference is done via Gibbs sampling. Estimation Inference
  16. 16. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 16Customer HANA Predictive Analysis Library – What’s New in SPS 10? Confusion Matrix (CM) Confusion Matrix Confusion matrix is a traditional method to evaluate the performance of classification algorithms, including multi-class problems. PAL confusion matrix calculates the precision, recall and F1-score/ Fβ- score. Input: ID, Original Label, Predicted Label Output: Class, Recall, Precision, F-measure, Support Predicted Class Actual Class
  17. 17. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 17Customer HANA Predictive Analysis Library – What’s New in SPS 10? Test for White Noise Test for White Noise The algorithm is used to identify if a time series has white noise or not. If white noise exists in the raw time series, the value of 1 will be returned by the algorithm as an indication, if not, the value of 0 will be returned. In PAL, Ljung-Box test is used to test for autocorrelation at different lags. White noise refers to the signal power distribution being independent over time or among frequencies. 1 = White noise exists 0 = White noise does NOT exist
  18. 18. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 18Customer HANA Predictive Analysis Library – What’s New in SPS 10? Test for Trend Test for Trend The algorithm is to identify if a time series has a trend, i.e., upward, downward, or no trend, and calculate the de-trended time series. Two methods are provided for identifying the trend, which are the difference-sign test and the rank test.
  19. 19. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 19Customer HANA Predictive Analysis Library – What’s New in SPS 10? Test for Seasonality Test for Seasonality – identify if seasonality exists for time series The algorithm is to test if a time series has a seasonality or not. If it does, the corresponding seasonality model (additive or multiplicative) is identified, and also the de-seasonalized series (i.e., both trend and seasonality are eliminated) is given.
  20. 20. Enhancements
  21. 21. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 21Customer HANA Predictive Analysis Library – What’s New in SPS 10? Enhancements (1 of 4) Random Distribution Sampling • Triangular distribution support Logistic Regression • Added Elastic Net regularization (linear combination of L1 and L2 regularization) to optimize fitting of a model • Add cancel flag Multiple Linear Regression • Change default optimization method to QR decomposition • Add cancel flag • Added Elastic Net regularization (linear combination of L1 and L2 regularization) to optimize fitting of a model
  22. 22. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 22Customer HANA Predictive Analysis Library – What’s New in SPS 10? Enhancements (2 of 4) Decision Trees (C4.5, CART, CHAID) • Enable column selection • Support asymmetric class by allowing penalty on misclassification of certain classes ARIMA • Support S-ARIMA, leverage seasonality patterns in ARIMA, ARIMA-X • Improved algorithm performance Exponential Smoothing • Added dampened trend to double and triple exponential smoothing algorithms. PAL provides two types of double exponential smoothing: Holt Linear Exponential Smoothing and Additive Damped Trend Holt Linear Exponential Smoothing. Holt’s linear method displays a constant trend indefinitely into the future. Empirical evidence indicates that the Holt linear method tends to over-forecast. Hence a parameter used to damp the trend may improve the situation. For additive triple exponential smoothing, one additive damped method is also supported.
  23. 23. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 23Customer HANA Predictive Analysis Library – What’s New in SPS 10? Enhancements (3 of 4) Substitute missing values function • Additional substitution strategy to replace the missing values with zeroes and specified value Apriori • Add time-out capability • Remove the data type restriction of rule output for APRIORI2 AprioriLite • Introduce a time-out in the algorithm FP-Growth • Provide relational table output for generated rules • Introduce a time-out in the algorithm, add cancel flag • Add extra filter parameters like LHS, RHS, MAX_CONSEQUENT • Add parallelism for rule generation
  24. 24. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 24Customer HANA Predictive Analysis Library – What’s New in SPS 10? Enhancements (4 of 4) Neural Network BP • Add multiple thread support for batch training • Variable learning rate for stochastic training • Enhanced stochastic decent optimization • Added multiple thread support for prediction • Added SoftMax output parameter for prediction SVM • Support categorical variables • Enhance output tables with more statistics DBSCAN • Support categorical variable
  25. 25. Documentation
  26. 26. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 26Customer How to find SAP HANA documentation on this topic? SAP HANA Platform SPS  What’s New – Release Notes  Installation – SAP HANA Server InstallationGuide  Administration – SAP HANA Administration Guide  SAP HANA Options – SAP HANA Predictive Analysis Library (PAL) Reference  Development – SAP HANA Developer Guide • In addition to this learning material, you can find SAP HANA documentation on SAP Help Portal knowledge center at http://help.sap.com/hana_platform. • The knowledge center is structured according to the product lifecycle: installation, security, administration, development. So you can find e.g. the SAP HANA Predictive Analysis Library (PAL) Reference in the SAP HANA Optionst section and so forth …
  27. 27. Application Function Modeler
  28. 28. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 28Customer Web-Based Application Function Modeler for SAP HANA Graphical Re-Design and DataFlow Enhancements with HANA SPS10 New Web-based Flowgraph Editor • Support opening flowgraph file • Support for AFL transform • Support for R-Script tranform • Support for SDI/SDQ • Create Procedure or Task runtime options Graphical dataflow modeling Compose Application Function Calls (PAL, BFL, …) Writing custom operators in R +Information Management & DataQuality Operations Standard Set- Operation+ + • Interoperability of Flowgraph editors - Flowgraph created in web-based editor works in Hana Studio editor and vice versa. Note: Not all AFL functions will be supported in first SPS10 revision. PAL is NOT supported in first SPS10 revision.
  29. 29. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 29Customer Web-Based Application Function Modeler for SAP HANA AFL Node enables selection of function by area • AFL functions do not individually appear in the palette as they do in SAP HANA studio. • One AFL Node can be selected • The AFL function is then selected. Note: Not all AFL functions are supported in initial SPS10 revision. PAL is NOT supported in first SPS10 revision.
  30. 30. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 30Customer Application Function Modeler Enhancements SPS10 General • Support opening empty flowgraph file • Update the regular expression in existing PAL palette template to fit PAL SPS10 • Enhanced flow validation, NCLOB support Added new PAL functions into the palette template (9 new base functions) • Time Series: Trend Test, White Noise Test, Seasonality Test • Clustering: GMM, LDA Estimate, LDA Inference • Classification: PSME (Parameter selection and model evaluation), Confusion Matrix • Statistics: Grubbs Test Update the function parameters in existing PAL palette template • Remove deprecated parameters • Add new available parameters • Update the default value for some existing parameters Added PAL Overload Functions (14 functions) Clustering • ANOMALYDETECTION__OVERLOAD_2_3 • KMEANS__OVERLOAD_2_4 • LDAESTIMATE__OVERLOAD_2_6 • LDAINFERENCE__OVERLOAD_5_2 • LDAINFERENCE__OVERLOAD_5_3 Time Series • SINGLESMOOTH__OVERLOAD_2_2 • DOUBLEMOOTH__OVERLOAD_2_2 • TRIPLESMOOTH__OVERLOAD_2_2 Note: the function <Base Name>__OVERLOAD_2_3 is the overload function with 2 input tables and 3 output tables. Classification • PREDICTWITHBPNN__OVERLOA D_3_2 • PSME__OVERLOAD_3_4 • PSME__OVERLOAD_3_5 • SVMPREDICT__OVERLOAD_5_1 • SVMTRAIN__OVERLOAD_2_3 Association • FPGROWTH__OVERLOAD_2_3
  31. 31. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 31Public How to find SAP HANA documentation on this topic? SAP HANA Platform documentation  What’s New – Release Notes  Modeling Information – SAP HANA Modeling Guide  Development Information – SAP HANA Developer Guide  References – SAP HANA SQL Reference • In addition to this learning material, you find SAP HANA documentation on SAP Help Portal knowledge center at http://help.sap.com/hana_platform. • The knowledge center is structured according to the product lifecycle: installation > security > administration > modeling > development. So you can find e.g. the SAP HANA Developer Guide in the Develepment section and so forth …
  32. 32. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 32Public How to find SAP HANA demo examples on this topic? • Go Online under https://www.youtube.com/user/saphanaacademy .
  33. 33. © 2015 SAP SE or an SAP affiliate company. All rights reserved. Thank you Contact information Mark Hourani SAP HANA Product Management AskSAPHANA@sap.com
  34. 34. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 34Public © 2015 SAP SE or an SAP affiliate company. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company. SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. Please see http://global12.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices. Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors. National product specifications may vary. These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP SE or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP SE or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward- looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.
  35. 35. © 2015 SAP SE or an SAP affiliate company. All rights reserved. 35Public Planned Innovations Future DirectionTODAY SAP HANA Predictive Analysis Library Roadmap Product road map overview – key themes and capabilities Framework  SAP HANA built-in C++ library for Advanced Analytics  Accessible from Application Function Layer via SQLScript, SAP Predictive Analysis, and Application Function Modeler Algorithms and Functions  Regression Multiple Linear Regression, Polynomial / Exponential / Bi- Variate Geometric / Logarithmic Regression  Classification (Multi-class) Log. Regression, SVM, C4.5 & CHAID & CART Dec. Trees, KNN , Naive Bayes, Neural Network…  Clustering K-means/median/Medoid/, DBSCAN, Hierarchical clustering, Affinity Propagation…  Association Rule Learning Apriori / Apriori Lite, FP-Growth  Time Series Analysis Exponential/AHEAD smoothing, ARIMA-X, Croston method  Others Distribution fit, Random sampling, Link prediction, ABC analysis, Univariate / multivariate statistics… Framework  Enhanced PAL exception handling for HANA SQL handler  Disable L Interface – except for Apriori, Apriori Lite, and Logistic Regression  PAL function integration with SAP HANA Series Data and window function (single & double exponential smoothing, binning, partion) Algorithms  Topic model algorithm (Latent Dirichlet allocation LDA)  Time series enhancement (Seasonal-ARIMA,& ARIMA-X, test of trend/seasonality/white noise, …)  Grubbs Outlier Test - Anomaly detection  Gaussian Mixture Model (clustering with probability)  Confusion Matrix - Classification evaluation  Enable cross-validation for some PAL functions – Naïve Bayes, SVM and Logistic Regression  Various enhancements to many algorithms Framework  Generalized storage and consumption of predictive models  Simplified programming approach  Architecture/Interface improvements  Enable PAL functions in distributed HANA environment  Built-in PAL functions in HANA without explicit AFL installation Algorithms  Meta-heuristic optimization  Unified linear model (Generalized linear model, Generalized Additive Model)  Sparse data analysis  Tests of goodness of fit enhancements  Random Forest  Recommendation algorithms  Discriminant analysis  Bayesian optimization  Matrix factorization  … This is the current state of planning and may be changed by SAP at any time.(Release SPS09) (To be released in SPS10)

×