Your SlideShare is downloading. ×
PMML - Predictive Model Markup Language
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

PMML - Predictive Model Markup Language

5,716
views

Published on

PMML (Predictive Model Markup Language) provides a standard way to represent data mining models so that these can be shared between different statistical applications. Not only PMML can represent a …

PMML (Predictive Model Markup Language) provides a standard way to represent data mining models so that these can be shared between different statistical applications. Not only PMML can represent a wide range of statistical techniques, but it can also be used to represent the data transformations necessary to transform raw data into meaningful feature detectors. In this way, PMML offers a standard to represent data manipulation and modeling in a single and concise way.

Published in: Technology

0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,716
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. PMML Overview
  • 2. PMML defines a standard not only to represent data-mining models, but also data handling and data transformations (pre- and post-processing) PMML Predictive Model Markup Language Transformations
    • PMML is an XML-based language used to define statistical and data mining models and to share these between compliant applications.
    • It is a mature standard developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models.
    • PMML allows for the clear separation of tasks: Model development vs. model deployment. As a consequence, scientists can focus on building the best model.
    • PMML eliminates need for custom model deployment and ensures scalability and reliability.
    Models
  • 3. Matured and Supported by Industry PMML PMML Industry Support
    • Data Mining Group http:// www.dmg.org
    • Mature standard
      • Current version 4.0 (just released)
      • Active group and constant enhancements
    • Vendor independent consortium
    • Industry supporters
      • Major Players: IBM, Oracle, SAP, Microsoft
      • Analytics: KXEN, SAS, Salford, SPSS, Zementis
      • BI: Microstrategy, Teradata, Tibco
      • Open Source: KNIME, R, Rapid-I
      • Others: Equifax, FICO, Open Data Group, Visa, Pervasive
  • 4. PMML Components
    • A Data Dictionary defines all the raw data fields (including missing value strategy and outlier treatment).
    • Several Data Transformations strategies allow for intelligent extraction of feature detectors from raw data (“data massaging”).
    • A comprehensive list of Data-Mining Models offers power and flexibility.
    • Post-processing of results allow for tailored decisions.
    • Model Explanation allows for performance evaluation.
  • 5. PMML Files 3) Post-Processing Scaling of model outputs can be performed with PMML element Targets 1) Pre-Processing PMML elements Transformations, Mining Schema and Functions allow for effective pre-processing 2) Models PMML allows for several predictive modeling techniques to be fully expressed PMML
  • 6. PMML: Data Pre-Processing
    • Data Dictionary : Allows for the explicit specification of valid, invalid and missing values.
    • Mining Schema : Used to define the appropriate treatment to be applied to missing and invalid values.
    • Transformations : Allow for variable discretization, normalization, and mapping with handling of missing and default values.
    • Built-in Functions : Arithmetic expressions, handling of date and time as well as strings. Also used for implementing IF-THEN-ELSE logic and Boolean operations.
    Data Pre-Processing 1
  • 7. Data Pre-Processing: PMML Example Arbitrary Piecewise Linear Function This PMML code implements: Var_b:=interpolate(Var_a,((100,0),(200,1),(800,3),(900,4))) See http://www.dmg.org/v3-2/Transformations.html - look for element NormContinuous.
  • 8. Modeling Elements
    • PMML allows for several predictive modeling techniques to be expressed directly. Supported techniques which have their own elements are:
      • Regression and General Regression
      • Neural Networks
      • Support Vector Machines
      • Decision Trees
      • Naïve Bayes
      • Clustering
      • Sequences
      • Rule Sets
      • Association Rules
      • Time-Series (as of PMML 4.0)
      • Text Models
      • Support for Multiple Models
    Easy Expression of Predictive Models 2
  • 9. Modeling Elements: PMML Example for Neural Network
  • 10. The PMML code below implements score post-processing. It uses the PMML element Targets for checking boundaries ( min and max ) and to rescale ( rescaleConstant and rescaleFactor ) the original score generated by model See http://www.dmg.org/v3-2/Targets.html Data Post-Processing: PMML Example 3
  • 11. Applications Service Providers External Vendors Divisions One Standard, One Process
  • 12. PMML = Easy Model Deployment Model Deployment Model Building PMML
  • 13. PMML - Zementis Contributions
    • ADAPA : A decision engine that deploys models expressed in PMML and executes them in real-time. Now available as a service on the Amazon Cloud.
    • PMML Converter : Validates, converts, and corrects old and new PMML code. Available at the DMG website and at http://www.zementis.com/pmml.htm .
    • Contributing Member of the DMG : Submitted several proposals for PMML 4.0 and already working with other members on PMML 4.1.
    • Code contributor for the R PMML package (available on CRAN).
    • PMML Articles : R Journal and SIGKDD Explorations Newsletter. Available for downloading at http:// www.zementis.com/manual.htm
    • PMML Blogs : Several blogs on PMML topics ( http://adapasupport.zementis.com and http ://www.predictive-analytics.info ).
  • 14. Thank You! U.S.A Headquarters Asia Office E-mail: [email_address] 19/F., Unit A Ho Lee Commercial Building 38-44 D’Aguilar Street Central, Hong Kong (S.A.R.) Tel: +852 2868-0878 Fax: +852 2845-6027 6125 Cornerstone Court East Suite 250 San Diego, CA, 92121 Tel: +1 619 330-0780 Fax: +1 858 535-0227

×