• Save
PMML - Predictive Model Markup Language
Upcoming SlideShare
Loading in...5
×
 

PMML - Predictive Model Markup Language

on

  • 6,205 views

PMML (Predictive Model Markup Language) provides a standard way to represent data mining models so that these can be shared between different statistical applications. Not only PMML can represent a ...

PMML (Predictive Model Markup Language) provides a standard way to represent data mining models so that these can be shared between different statistical applications. Not only PMML can represent a wide range of statistical techniques, but it can also be used to represent the data transformations necessary to transform raw data into meaningful feature detectors. In this way, PMML offers a standard to represent data manipulation and modeling in a single and concise way.

Statistics

Views

Total Views
6,205
Views on SlideShare
6,189
Embed Views
16

Actions

Likes
4
Downloads
0
Comments
0

1 Embed 16

http://www.slideshare.net 16

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

PMML - Predictive Model Markup Language PMML - Predictive Model Markup Language Presentation Transcript

  • PMML Overview
  • PMML defines a standard not only to represent data-mining models, but also data handling and data transformations (pre- and post-processing) PMML Predictive Model Markup Language Transformations
    • PMML is an XML-based language used to define statistical and data mining models and to share these between compliant applications.
    • It is a mature standard developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models.
    • PMML allows for the clear separation of tasks: Model development vs. model deployment. As a consequence, scientists can focus on building the best model.
    • PMML eliminates need for custom model deployment and ensures scalability and reliability.
    Models
  • Matured and Supported by Industry PMML PMML Industry Support
    • Data Mining Group http:// www.dmg.org
    • Mature standard
      • Current version 4.0 (just released)
      • Active group and constant enhancements
    • Vendor independent consortium
    • Industry supporters
      • Major Players: IBM, Oracle, SAP, Microsoft
      • Analytics: KXEN, SAS, Salford, SPSS, Zementis
      • BI: Microstrategy, Teradata, Tibco
      • Open Source: KNIME, R, Rapid-I
      • Others: Equifax, FICO, Open Data Group, Visa, Pervasive
  • PMML Components
    • A Data Dictionary defines all the raw data fields (including missing value strategy and outlier treatment).
    • Several Data Transformations strategies allow for intelligent extraction of feature detectors from raw data (“data massaging”).
    • A comprehensive list of Data-Mining Models offers power and flexibility.
    • Post-processing of results allow for tailored decisions.
    • Model Explanation allows for performance evaluation.
  • PMML Files 3) Post-Processing Scaling of model outputs can be performed with PMML element Targets 1) Pre-Processing PMML elements Transformations, Mining Schema and Functions allow for effective pre-processing 2) Models PMML allows for several predictive modeling techniques to be fully expressed PMML
  • PMML: Data Pre-Processing
    • Data Dictionary : Allows for the explicit specification of valid, invalid and missing values.
    • Mining Schema : Used to define the appropriate treatment to be applied to missing and invalid values.
    • Transformations : Allow for variable discretization, normalization, and mapping with handling of missing and default values.
    • Built-in Functions : Arithmetic expressions, handling of date and time as well as strings. Also used for implementing IF-THEN-ELSE logic and Boolean operations.
    Data Pre-Processing 1
  • Data Pre-Processing: PMML Example Arbitrary Piecewise Linear Function This PMML code implements: Var_b:=interpolate(Var_a,((100,0),(200,1),(800,3),(900,4))) See http://www.dmg.org/v3-2/Transformations.html - look for element NormContinuous.
  • Modeling Elements
    • PMML allows for several predictive modeling techniques to be expressed directly. Supported techniques which have their own elements are:
      • Regression and General Regression
      • Neural Networks
      • Support Vector Machines
      • Decision Trees
      • Naïve Bayes
      • Clustering
      • Sequences
      • Rule Sets
      • Association Rules
      • Time-Series (as of PMML 4.0)
      • Text Models
      • Support for Multiple Models
    Easy Expression of Predictive Models 2
  • Modeling Elements: PMML Example for Neural Network
  • The PMML code below implements score post-processing. It uses the PMML element Targets for checking boundaries ( min and max ) and to rescale ( rescaleConstant and rescaleFactor ) the original score generated by model See http://www.dmg.org/v3-2/Targets.html Data Post-Processing: PMML Example 3
  • Applications Service Providers External Vendors Divisions One Standard, One Process
  • PMML = Easy Model Deployment Model Deployment Model Building PMML
  • PMML - Zementis Contributions
    • ADAPA : A decision engine that deploys models expressed in PMML and executes them in real-time. Now available as a service on the Amazon Cloud.
    • PMML Converter : Validates, converts, and corrects old and new PMML code. Available at the DMG website and at http://www.zementis.com/pmml.htm .
    • Contributing Member of the DMG : Submitted several proposals for PMML 4.0 and already working with other members on PMML 4.1.
    • Code contributor for the R PMML package (available on CRAN).
    • PMML Articles : R Journal and SIGKDD Explorations Newsletter. Available for downloading at http:// www.zementis.com/manual.htm
    • PMML Blogs : Several blogs on PMML topics ( http://adapasupport.zementis.com and http ://www.predictive-analytics.info ).
  • Thank You! U.S.A Headquarters Asia Office E-mail: [email_address] 19/F., Unit A Ho Lee Commercial Building 38-44 D’Aguilar Street Central, Hong Kong (S.A.R.) Tel: +852 2868-0878 Fax: +852 2845-6027 6125 Cornerstone Court East Suite 250 San Diego, CA, 92121 Tel: +1 619 330-0780 Fax: +1 858 535-0227