• Save
MS SQL SERVER: Data mining concepts and dmx
Upcoming SlideShare
Loading in...5

MS SQL SERVER: Data mining concepts and dmx



MS SQL SERVER: Data mining concepts and dmx

MS SQL SERVER: Data mining concepts and dmx



Total Views
Views on SlideShare
Embed Views



2 Embeds 7

http://dataminingtools.net 4
http://www.dataminingtools.net 3



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    MS SQL SERVER: Data mining concepts and dmx MS SQL SERVER: Data mining concepts and dmx Presentation Transcript

    • Data Mining Concepts
    • overview
      History of DMX
      DMX Introduction
      DMX objects
      Query Syntax
    • History of DMX
      DMX was first introduced in the OLE DB for Data Mining specification authored by Microsoft in conjunction with other vendors in 1999.
      The goal of DMX is to define common concepts and common query expressions for the data mining world.
      It is similar to what SQL has done for databases.
    • overview of DMX
      Data Mining Extensions (DMX) is a query language for Data Mining Models. It consists of:
      • DDL (Data definition Language)
      The DDL of the DMX is used to create new data mining models and structures, export and import mining structures, copy or transfer data from one mining model to another and delete existing data mining models and mining structures.
      • DML (Data Manipulation Language)
      The DML of the DMX is used to search and browse data in the data mining models, update the data mining models by insertion and updating of the data and derive predictions using the prediction query.
    • DMX objects
      Data Mining Extensions (DMX) is a language that you can use to create and work with data mining models in Microsoft SQL Server Analysis Services.
      DMX is used to create the structure of new data mining models, to train these models, and to browse, manage, and predict against them.
      There are two major objects that are used to manifest this transformation:
      • The mining structure
      • The mining model
    • The mining structure
      A mining structure is defined as a list of columns, with their data types and information describing how they should be handled.
      When a mining structure is processed, it contains a compressed cache or copy of the source data.
      This cache is used to train any models that are subsequently added to the structure. which can be queried to return its data or the distinct states that exist in any structure column.
      The cache is only maintained temporarily, and can be dropped at any time.
    • The mining model
      A mining model is the object that transforms rows of data into cases and performs the machine learning using a specified data mining algorithm.
      A mining model is described as a subset of columns from the structure, how those columns are to be used as attributes along with the algorithm and parameters to perform machine learning on the structure data.
      Statistics about predictions are available as well, Additionally the learned patterns themselves can be queried to discover what the algorithm found.
      These patterns are generally referred to as the model content.
    • Query Syntax
      DMX statements are used to create, process, delete, copy, browse, and predict against data mining models.
      The three basic steps for data mining process are:
      • Creation
      • Prediction
      • Training
      Creating mining structures is similar to creating tables in SQL.
      ( [(<column definition list>)] )
      StructureA unique name for the structure.
      column definitionlistA comma-separated list of column definitions
      The following example creates a new mining structure called New Mailing.
      ( CustomerKey LONG KEY,
      Gender TEXT DISCRETE,
      [Number Cars Owned] LONG DISCRETE,
      [Bike Buyer] LONG DISCRETE )
      Creates a new mining model that is based on an existing mining structure.
      When you use the alter structure statement to create a new mining model, the structure must already exist.
      ALTER MINING STRUCTURE <structure>
      ADD MINING MODEL <model>
      ( <column definition list> [(<nested column definition list>) [WITH FILTER (<nested filter criteria>)]] )
      USING <algorithm> [(<parameter list>)]
       FILTER keyword is used to filter condition.
      The following example adds a Naive Bayes mining model to the New Mailing mining structure and limits the maximum number of attribute states to 50.

      ADD MINING MODEL [Naive Bayes]
      ( CustomerKey,
      [Number Cars Owned],
      [Bike Buyer] PREDICT )
      USING Microsoft_Naive_Bayes (MAXIMUM_STATES = 50)
    • Data Types and Content types
      The following table shows the list of data types and content types for mining structure columns:
      Time Series models.
      Sequence Clustering models in nested tables.
      Deletes a mining model from the database.
      DROP MINING MODEL <model >
      ModelA model identifier.
      Ex: The following sample code drops the mining model NBSample.
      Ex: Consider the following case derived from two tables, one table that contains customer information and another table that contains customer purchases. A single customer in the customer table may have multiple purchases in the purchases table, which makes it difficult to describe the data using a single row.
      Analysis Services provides a unique method for handling these cases, by using nested tables.
      The concept of a nested table is demonstrated in the following illustration.
    • The first table is the parent table has information about customers, and associates a unique identifier for each customer.
      The second table, the child table, contains purchases for each customer.
      The purchases in the child table are related back to the parent table by the unique identifier, the CustomerKey column. The third table in the diagram shows the two tables combined.
    • Prediction
      Predictionmeansapplying the patterns that were found in the data to estimate unknown information.
      Examples: of prediction might be predicting if a customer will or will not be good for a loan, estimating a credit score, determining to what cluster a case belongs, or predicting future values of a time series.
    • Prediction Join
      Using prediction join in this example we can come to conclusion that:
      ‘‘if the kid is male and class is 5, then the highest scored subject is science.’’
    • Prediction Join syntax
      SELECT [TOP <count>] <column references> FROM <mining model>
      [ ON <mapping clause> ]
      [ WHERE <condition clause> ]
      [ ORDER BY <order clause> [DESC | ASC] ]]
      Count  Optional, An integer that specifies how many rows to return.
      column referencesA comma-separated list of column identifiers an expressions that are derived from the mining model.
      mining modelA model identifier.
      source -dataThe source query.
      mapping clauseOptional, A logical expression that compares columns from the model to columns from the source query.
      condition clause Optional, A condition to restrict the values that are returned from the column list.
      order clause Optional, An expression that returns a scalar value.
    • summary
      History of DMX
      DMX Introduction
      DMX objects
      Query Syntax
      Prediction join syntax
    • Visit more self help tutorials
      Pick a tutorial of your choice and browse through it at your own pace.
      The tutorials section is free, self-guiding and will not involve any additional support.
      Visit us at www.dataminingtools.net