Appendix A. An Introduction to Microsoft's OLE DB for Data Mining
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

  • 1,570 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,570
On Slideshare
1,570
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
42
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Data Mining: Concepts and Techniques — Slides for Textbook — — Appendix A —
    • ©Jiawei Han and Micheline Kamber
    • Slides contributed by Jian Pei (peijian@cs.sfu.ca)
    • Department of Computer Science
    • University of Illinois at Urbana-Champaign
    • www.cs.uiuc.edu/~hanj
  • 2. Appendix A: An Introduction to Microsoft’s OLE OLDB for Data Mining
    • Introduction
    • Overview and design philosophy
    • Basic components
      • Data set components
      • Data mining models
    • Operations on data model
    • Concluding remarks
  • 3. Why OLE DB for Data Mining?
    • Industry standard is critical for data mining development, usage, interoperability, and exchange
    • OLEDB for DM is a natural evolution from OLEDB and OLDB for OLAP
    • Building mining applications over relational databases is nontrivial
      • Need different customized data mining algorithms and methods
      • Significant work on the part of application builders
    • Goal: ease the burden of developing mining applications in large relational databases
  • 4. Motivation of OLE DB for DM
    • Facilitate deployment of data mining models
      • Generating data mining models
      • Store, maintain and refresh models as data is updated
      • Programmatically use the model on other data set
      • Browse models
    • Enable enterprise application developers to participate in building data mining solutions
  • 5. Features of OLE DB for DM
    • Independent of provider or software
    • Not specialized to any specific mining model
    • Structured to cater to all well-known mining models
    • Part of upcoming release of Microsoft SQL Server 2000
  • 6. Overview
    • Core relational engine exposes OLE DB in a language-based API
    • Analysis server exposes OLE DB OLAP and OLE DB DM
    • Maintain SQL metaphor
    • Reuse existing notions
    RDB engine OLE DB Analysis Server OLE DB OLAP/DM Data mining applications
  • 7. Key Operations to Support Data Mining Models
    • Define a mining model
      • Attributes to be predicted
      • Attributes to be used for prediction
      • Algorithm used to build the model
    • Populate a mining model from training data
    • Predict attributes for new data
    • Browse a mining model fro reporting and visualization
  • 8. DMM As Analogous to A Table in SQL
    • Create a data mining module object
      • CREATE MINING MODEL [model_name]
    • Insert training data into the model and train it
      • INSERT INTO [model_name]
    • Use the data mining model
      • SELECT relation_name.[id], [model_name].[predict_attr]
      • consult DMM content in order to make predictions and browse statistics obtained by the model
    • Using DELETE to empty/reset
    • Predictions on datasets: prediction join between a model and a data set (tables)
    • Deploy DMM by just writing SQL queries!
  • 9. Two Basic Components
    • Cases/caseset: input data
      • A table or nested tables (for hierarchical data)
    • Data mining model (DMM): a special type of table
      • A caseset is associated with a DMM and meta-info while creating a DMM
      • Save mining algorithm and resulting abstraction instead of data itself
      • Fundamental operations: CREATE, INSERT INTO, PREDICTION JOIN, SELECT, DELETE FROM, and DROP
  • 10. Flatterned Representation of Caseset Problem: Lots of replication! Age Prob Age Hair Color Gender Customer ID Customers Car Prob Car Customer ID Car Owernership Product Type Quantity Product Name Customer ID Product Purchases 50% Van Food 6 Ham 100% 35 Black Male 1 50% Van Elec 1 VCR 100% 35 Black Male 1 50% Van Elec 1 TV 100% 35 Black Male 1 100% Car Food 6 Ham 100% 35 Black Male 1 100% Car Elec 1 VCR 100% 35 Black Male 1 100% Car Elec 1 TV 100% 35 Black Male 1 Car prob Car Type Quan Prod Age prob Age Hair Gend CID
  • 11. Logical Nested Table Representation of Caseset
    • Use Data Shaping Service to generate a hierarchical rowset
      • Part of Microsoft Data Access Components (MDAC) products
    Car Ownership Product Purchases Food 6 Ham 50% Van Elec 1 VCR 100% Car Elec 1 TV 100% 35 Black Male 1 Car prob Car Type Quan Prod Age prob Age Hair Gend CID
  • 12. More About Nested Table
    • Not necessary for the storage subsystem to support nested records
    • Cases are only instantiated as nested rowsets prior to training/predicting data mining models
    • Same physical data may be used to generate different casesets
  • 13. Defining A Data Mining Model
    • The name of the model
    • The algorithm and parameters
    • The columns of caseset and the relationships among columns
    • “ Source columns” and “prediction columns”
  • 14. Example CREATE MINING MODEL [Age Prediction] %Name of Model ( [Customer ID] LONG KEY, %source column [Gender] TEXT DISCRETE, %source column [Age] Double DISCRETIZED() PREDICT, %prediction column [Product Purchases] TABLE %source column ( [Product Name] TEXT KEY, %source column [Quantity] DOUBLE NORMAL CONTINUOUS, %source column [Product Type] TEXT DISCRETE RELATED TO [Product Name] %source column )) USING [Decision_Trees_101] %Mining algorithm used
  • 15. Column Specifiers
    • KEY
    • ATTRIBUTE
    • RELATION (RELATED TO clause)
    • QUALIFIER (OF clause)
      • PROBABILITY: [0, 1]
      • VARIANCE
      • SUPPORT
      • PROBABILITY-VARIANCE
      • ORDER
      • TABLE
  • 16. Attribute Types
    • DISCRETE
    • ORDERED
    • CYCLICAL
    • CONTINOUS
    • DISCRETIZED
    • SEQUENCE_TIME
  • 17. Populating A DMM
    • Use INSERT INTO statement
    • Consuming a case using the data mining model
    • Use SHAPE statement to create the nested table from the input data
  • 18. Example: Populating a DMM INSERT INTO [Age Prediction] ( [Customer ID], [Gender], [Age], [Product Purchases](SKIP, [Product Name], [Quantity], [Product Type]) ) SHAPE {SELECT [Customer ID], [Gender], [Age] FROM Customers ORDER BY [Customer ID]} APPEND {SELECT [CustID], {product Name], [Quantity], [Product Type] FROM Sales ORDER BY [CustID]} RELATE [Customer ID] TO [CustID] ) AS [Product Purchases]
  • 19. Using Data Model to Predict
    • Prediction join
      • Prediction on dataset D using DMM M
      • Different to equi-join
    • DMM: a “truth table”
    • SELECT statement associated with PREDICTION JOIN specifies values extracted from DMM
  • 20. Example: Using a DMM in Prediction SELECT t.[Customer ID], [Age Prediction].[Age] FROM [Age Prediction] PRECTION JOIN (SHAPE {SELECT [Customer ID], [Gender] FROM Customers ORDER BY [Customer ID]} APPEND ( {SELECT [CustID], [Product Name], [Quantity] FROM Sales ORDER BY [CustID]} RELATE [Customer ID] TO [CustID] ) AS [Product Purchases] ) AS t ON [Age Prediction].[Gender]=t.[Gender] AND [Age Prediction].[Product Purchases].[Product Name]=t.[Product Purchases].[Product Name] AND [Age Prediction].[Product Purchases].[Quantity]=t.[Product Purchases].[Quantity]
  • 21. Browsing DMM
    • What is in a DMM?
      • Rules, formulas, trees, …, etc
    • Browsing DMM
      • Visualization
  • 22. Concluding Remarks
    • OLE DB for DM integrates data mining and database systems
      • A good standard for mining application builders
    • How can we be involved?
      • Provide association/sequential pattern mining modules for OLE DB for DM?
      • Design more concrete language primitives?
    • References
      • http://www.microsoft.com/data.oledb/dm.html
  • 23. www.cs.uiuc.edu/~hanj
    • Thank you !!!