PMML - Predictive Model Markup Language


Published on

PMML (Predictive Model Markup Language) provides a standard way to represent data mining models so that these can be shared between different statistical applications. Not only PMML can represent a wide range of statistical techniques, but it can also be used to represent the data transformations necessary to transform raw data into meaningful feature detectors. In this way, PMML offers a standard to represent data manipulation and modeling in a single and concise way.

Published in: Technology

PMML - Predictive Model Markup Language

  1. 1. PMML Overview
  2. 2. PMML defines a standard not only to represent data-mining models, but also data handling and data transformations (pre- and post-processing) PMML Predictive Model Markup Language Transformations <ul><li>PMML is an XML-based language used to define statistical and data mining models and to share these between compliant applications. </li></ul><ul><li>It is a mature standard developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models. </li></ul><ul><li>PMML allows for the clear separation of tasks: Model development vs. model deployment. As a consequence, scientists can focus on building the best model. </li></ul><ul><li>PMML eliminates need for custom model deployment and ensures scalability and reliability. </li></ul>Models
  3. 3. Matured and Supported by Industry PMML PMML Industry Support <ul><li>Data Mining Group http:// </li></ul><ul><li>Mature standard </li></ul><ul><ul><li>Current version 4.0 (just released) </li></ul></ul><ul><ul><li>Active group and constant enhancements </li></ul></ul><ul><li>Vendor independent consortium </li></ul><ul><li>Industry supporters </li></ul><ul><ul><li>Major Players: IBM, Oracle, SAP, Microsoft </li></ul></ul><ul><ul><li>Analytics: KXEN, SAS, Salford, SPSS, Zementis </li></ul></ul><ul><ul><li>BI: Microstrategy, Teradata, Tibco </li></ul></ul><ul><ul><li>Open Source: KNIME, R, Rapid-I </li></ul></ul><ul><ul><li>Others: Equifax, FICO, Open Data Group, Visa, Pervasive </li></ul></ul>
  4. 4. PMML Components <ul><li>A Data Dictionary defines all the raw data fields (including missing value strategy and outlier treatment). </li></ul><ul><li>Several Data Transformations strategies allow for intelligent extraction of feature detectors from raw data (“data massaging”). </li></ul><ul><li>A comprehensive list of Data-Mining Models offers power and flexibility. </li></ul><ul><li>Post-processing of results allow for tailored decisions. </li></ul><ul><li>Model Explanation allows for performance evaluation. </li></ul>
  5. 5. PMML Files 3) Post-Processing Scaling of model outputs can be performed with PMML element Targets 1) Pre-Processing PMML elements Transformations, Mining Schema and Functions allow for effective pre-processing 2) Models PMML allows for several predictive modeling techniques to be fully expressed PMML
  6. 6. PMML: Data Pre-Processing <ul><li>Data Dictionary : Allows for the explicit specification of valid, invalid and missing values. </li></ul><ul><li>Mining Schema : Used to define the appropriate treatment to be applied to missing and invalid values. </li></ul><ul><li>Transformations : Allow for variable discretization, normalization, and mapping with handling of missing and default values. </li></ul><ul><li>Built-in Functions : Arithmetic expressions, handling of date and time as well as strings. Also used for implementing IF-THEN-ELSE logic and Boolean operations. </li></ul>Data Pre-Processing 1
  7. 7. Data Pre-Processing: PMML Example Arbitrary Piecewise Linear Function This PMML code implements: Var_b:=interpolate(Var_a,((100,0),(200,1),(800,3),(900,4))) See - look for element NormContinuous.
  8. 8. Modeling Elements <ul><li>PMML allows for several predictive modeling techniques to be expressed directly. Supported techniques which have their own elements are: </li></ul><ul><ul><li>Regression and General Regression </li></ul></ul><ul><ul><li>Neural Networks </li></ul></ul><ul><ul><li>Support Vector Machines </li></ul></ul><ul><ul><li>Decision Trees </li></ul></ul><ul><ul><li>Naïve Bayes </li></ul></ul><ul><ul><li>Clustering </li></ul></ul><ul><ul><li>Sequences </li></ul></ul><ul><ul><li>Rule Sets </li></ul></ul><ul><ul><li>Association Rules </li></ul></ul><ul><ul><li>Time-Series (as of PMML 4.0) </li></ul></ul><ul><ul><li>Text Models </li></ul></ul><ul><ul><li>Support for Multiple Models </li></ul></ul>Easy Expression of Predictive Models 2
  9. 9. Modeling Elements: PMML Example for Neural Network
  10. 10. The PMML code below implements score post-processing. It uses the PMML element Targets for checking boundaries ( min and max ) and to rescale ( rescaleConstant and rescaleFactor ) the original score generated by model See Data Post-Processing: PMML Example 3
  11. 11. Applications Service Providers External Vendors Divisions One Standard, One Process
  12. 12. PMML = Easy Model Deployment Model Deployment Model Building PMML
  13. 13. PMML - Zementis Contributions <ul><li>ADAPA : A decision engine that deploys models expressed in PMML and executes them in real-time. Now available as a service on the Amazon Cloud. </li></ul><ul><li>PMML Converter : Validates, converts, and corrects old and new PMML code. Available at the DMG website and at . </li></ul><ul><li>Contributing Member of the DMG : Submitted several proposals for PMML 4.0 and already working with other members on PMML 4.1. </li></ul><ul><li>Code contributor for the R PMML package (available on CRAN). </li></ul><ul><li>PMML Articles : R Journal and SIGKDD Explorations Newsletter. Available for downloading at http:// </li></ul><ul><li>PMML Blogs : Several blogs on PMML topics ( and http :// ). </li></ul>
  14. 14. Thank You! U.S.A Headquarters Asia Office E-mail: [email_address] 19/F., Unit A Ho Lee Commercial Building 38-44 D’Aguilar Street Central, Hong Kong (S.A.R.) Tel: +852 2868-0878 Fax: +852 2845-6027 6125 Cornerstone Court East Suite 250 San Diego, CA, 92121 Tel: +1 619 330-0780 Fax: +1 858 535-0227