• Save
PMML - Predictive Model Markup Language
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

PMML - Predictive Model Markup Language

  • 6,753 views
Uploaded on

PMML (Predictive Model Markup Language) provides a standard way to represent data mining models so that these can be shared between different statistical applications. Not only PMML can represent a......

PMML (Predictive Model Markup Language) provides a standard way to represent data mining models so that these can be shared between different statistical applications. Not only PMML can represent a wide range of statistical techniques, but it can also be used to represent the data transformations necessary to transform raw data into meaningful feature detectors. In this way, PMML offers a standard to represent data manipulation and modeling in a single and concise way.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
6,753
On Slideshare
6,737
From Embeds
16
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
4

Embeds 16

http://www.slideshare.net 16

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. PMML Overview
  • 2. PMML defines a standard not only to represent data-mining models, but also data handling and data transformations (pre- and post-processing) PMML Predictive Model Markup Language Transformations
    • PMML is an XML-based language used to define statistical and data mining models and to share these between compliant applications.
    • It is a mature standard developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models.
    • PMML allows for the clear separation of tasks: Model development vs. model deployment. As a consequence, scientists can focus on building the best model.
    • PMML eliminates need for custom model deployment and ensures scalability and reliability.
    Models
  • 3. Matured and Supported by Industry PMML PMML Industry Support
    • Data Mining Group http:// www.dmg.org
    • Mature standard
      • Current version 4.0 (just released)
      • Active group and constant enhancements
    • Vendor independent consortium
    • Industry supporters
      • Major Players: IBM, Oracle, SAP, Microsoft
      • Analytics: KXEN, SAS, Salford, SPSS, Zementis
      • BI: Microstrategy, Teradata, Tibco
      • Open Source: KNIME, R, Rapid-I
      • Others: Equifax, FICO, Open Data Group, Visa, Pervasive
  • 4. PMML Components
    • A Data Dictionary defines all the raw data fields (including missing value strategy and outlier treatment).
    • Several Data Transformations strategies allow for intelligent extraction of feature detectors from raw data (“data massaging”).
    • A comprehensive list of Data-Mining Models offers power and flexibility.
    • Post-processing of results allow for tailored decisions.
    • Model Explanation allows for performance evaluation.
  • 5. PMML Files 3) Post-Processing Scaling of model outputs can be performed with PMML element Targets 1) Pre-Processing PMML elements Transformations, Mining Schema and Functions allow for effective pre-processing 2) Models PMML allows for several predictive modeling techniques to be fully expressed PMML
  • 6. PMML: Data Pre-Processing
    • Data Dictionary : Allows for the explicit specification of valid, invalid and missing values.
    • Mining Schema : Used to define the appropriate treatment to be applied to missing and invalid values.
    • Transformations : Allow for variable discretization, normalization, and mapping with handling of missing and default values.
    • Built-in Functions : Arithmetic expressions, handling of date and time as well as strings. Also used for implementing IF-THEN-ELSE logic and Boolean operations.
    Data Pre-Processing 1
  • 7. Data Pre-Processing: PMML Example Arbitrary Piecewise Linear Function This PMML code implements: Var_b:=interpolate(Var_a,((100,0),(200,1),(800,3),(900,4))) See http://www.dmg.org/v3-2/Transformations.html - look for element NormContinuous.
  • 8. Modeling Elements
    • PMML allows for several predictive modeling techniques to be expressed directly. Supported techniques which have their own elements are:
      • Regression and General Regression
      • Neural Networks
      • Support Vector Machines
      • Decision Trees
      • Naïve Bayes
      • Clustering
      • Sequences
      • Rule Sets
      • Association Rules
      • Time-Series (as of PMML 4.0)
      • Text Models
      • Support for Multiple Models
    Easy Expression of Predictive Models 2
  • 9. Modeling Elements: PMML Example for Neural Network
  • 10. The PMML code below implements score post-processing. It uses the PMML element Targets for checking boundaries ( min and max ) and to rescale ( rescaleConstant and rescaleFactor ) the original score generated by model See http://www.dmg.org/v3-2/Targets.html Data Post-Processing: PMML Example 3
  • 11. Applications Service Providers External Vendors Divisions One Standard, One Process
  • 12. PMML = Easy Model Deployment Model Deployment Model Building PMML
  • 13. PMML - Zementis Contributions
    • ADAPA : A decision engine that deploys models expressed in PMML and executes them in real-time. Now available as a service on the Amazon Cloud.
    • PMML Converter : Validates, converts, and corrects old and new PMML code. Available at the DMG website and at http://www.zementis.com/pmml.htm .
    • Contributing Member of the DMG : Submitted several proposals for PMML 4.0 and already working with other members on PMML 4.1.
    • Code contributor for the R PMML package (available on CRAN).
    • PMML Articles : R Journal and SIGKDD Explorations Newsletter. Available for downloading at http:// www.zementis.com/manual.htm
    • PMML Blogs : Several blogs on PMML topics ( http://adapasupport.zementis.com and http ://www.predictive-analytics.info ).
  • 14. Thank You! U.S.A Headquarters Asia Office E-mail: [email_address] 19/F., Unit A Ho Lee Commercial Building 38-44 D’Aguilar Street Central, Hong Kong (S.A.R.) Tel: +852 2868-0878 Fax: +852 2845-6027 6125 Cornerstone Court East Suite 250 San Diego, CA, 92121 Tel: +1 619 330-0780 Fax: +1 858 535-0227