Data Mining:  Concepts and Techniques   — Slides for Textbook —  —  Appendix A  — <ul><li>©Jiawei Han and Micheline Kamber...
Appendix A: An Introduction to Microsoft’s OLE OLDB for Data Mining <ul><li>Introduction </li></ul><ul><li>Overview and de...
Why OLE DB for Data Mining? <ul><li>Industry standard is critical for data mining development, usage, interoperability, an...
Motivation of OLE DB for DM <ul><li>Facilitate deployment of data mining models </li></ul><ul><ul><li>Generating data mini...
Features of OLE DB for DM <ul><li>Independent of provider or software </li></ul><ul><li>Not specialized to any specific mi...
Overview <ul><li>Core relational engine exposes OLE DB in a language-based API </li></ul><ul><li>Analysis server exposes O...
Key Operations to Support Data Mining Models <ul><li>Define  a mining model </li></ul><ul><ul><li>Attributes to be predict...
DMM As Analogous to A Table in SQL <ul><li>Create a data mining module object </li></ul><ul><ul><li>CREATE MINING MODEL [m...
Two Basic Components <ul><li>Cases/caseset: input data </li></ul><ul><ul><li>A table or nested tables (for hierarchical da...
Flatterned Representation of Caseset Problem: Lots of replication! Age Prob Age Hair Color Gender Customer ID Customers Ca...
Logical Nested Table Representation of Caseset <ul><li>Use Data Shaping Service to generate a hierarchical rowset </li></u...
More About Nested Table <ul><li>Not necessary for the storage subsystem to support nested records </li></ul><ul><li>Cases ...
Defining A Data Mining Model <ul><li>The name of the model </li></ul><ul><li>The algorithm and parameters </li></ul><ul><l...
Example CREATE MINING MODEL [Age Prediction] %Name of Model ( [Customer ID] LONG KEY, %source column [Gender] TEXT DISCRET...
Column Specifiers <ul><li>KEY </li></ul><ul><li>ATTRIBUTE </li></ul><ul><li>RELATION (RELATED TO clause) </li></ul><ul><li...
Attribute Types <ul><li>DISCRETE </li></ul><ul><li>ORDERED </li></ul><ul><li>CYCLICAL </li></ul><ul><li>CONTINOUS </li></u...
Populating A DMM <ul><li>Use INSERT INTO statement </li></ul><ul><li>Consuming a case using the data mining model </li></u...
Example: Populating a DMM INSERT INTO [Age Prediction] ( [Customer ID], [Gender], [Age], [Product Purchases](SKIP, [Produc...
Using Data Model to Predict <ul><li>Prediction join </li></ul><ul><ul><li>Prediction on dataset D using DMM M </li></ul></...
Example: Using a DMM in Prediction SELECT t.[Customer ID], [Age Prediction].[Age] FROM [Age Prediction] PRECTION JOIN (SHA...
Browsing DMM <ul><li>What is in a DMM? </li></ul><ul><ul><li>Rules, formulas, trees, …, etc </li></ul></ul><ul><li>Browsin...
Concluding Remarks <ul><li>OLE DB for DM integrates data mining and database systems </li></ul><ul><ul><li>A good standard...
www.cs.uiuc.edu/~hanj <ul><li>Thank you !!! </li></ul>
Upcoming SlideShare
Loading in …5
×

Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

1,442 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,442
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
49
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Appendix A. An Introduction to Microsoft's OLE DB for Data Mining

  1. 1. Data Mining: Concepts and Techniques — Slides for Textbook — — Appendix A — <ul><li>©Jiawei Han and Micheline Kamber </li></ul><ul><li>Slides contributed by Jian Pei (peijian@cs.sfu.ca) </li></ul><ul><li>Department of Computer Science </li></ul><ul><li>University of Illinois at Urbana-Champaign </li></ul><ul><li>www.cs.uiuc.edu/~hanj </li></ul>
  2. 2. Appendix A: An Introduction to Microsoft’s OLE OLDB for Data Mining <ul><li>Introduction </li></ul><ul><li>Overview and design philosophy </li></ul><ul><li>Basic components </li></ul><ul><ul><li>Data set components </li></ul></ul><ul><ul><li>Data mining models </li></ul></ul><ul><li>Operations on data model </li></ul><ul><li>Concluding remarks </li></ul>
  3. 3. Why OLE DB for Data Mining? <ul><li>Industry standard is critical for data mining development, usage, interoperability, and exchange </li></ul><ul><li>OLEDB for DM is a natural evolution from OLEDB and OLDB for OLAP </li></ul><ul><li>Building mining applications over relational databases is nontrivial </li></ul><ul><ul><li>Need different customized data mining algorithms and methods </li></ul></ul><ul><ul><li>Significant work on the part of application builders </li></ul></ul><ul><li>Goal: ease the burden of developing mining applications in large relational databases </li></ul>
  4. 4. Motivation of OLE DB for DM <ul><li>Facilitate deployment of data mining models </li></ul><ul><ul><li>Generating data mining models </li></ul></ul><ul><ul><li>Store, maintain and refresh models as data is updated </li></ul></ul><ul><ul><li>Programmatically use the model on other data set </li></ul></ul><ul><ul><li>Browse models </li></ul></ul><ul><li>Enable enterprise application developers to participate in building data mining solutions </li></ul>
  5. 5. Features of OLE DB for DM <ul><li>Independent of provider or software </li></ul><ul><li>Not specialized to any specific mining model </li></ul><ul><li>Structured to cater to all well-known mining models </li></ul><ul><li>Part of upcoming release of Microsoft SQL Server 2000 </li></ul>
  6. 6. Overview <ul><li>Core relational engine exposes OLE DB in a language-based API </li></ul><ul><li>Analysis server exposes OLE DB OLAP and OLE DB DM </li></ul><ul><li>Maintain SQL metaphor </li></ul><ul><li>Reuse existing notions </li></ul>RDB engine OLE DB Analysis Server OLE DB OLAP/DM Data mining applications
  7. 7. Key Operations to Support Data Mining Models <ul><li>Define a mining model </li></ul><ul><ul><li>Attributes to be predicted </li></ul></ul><ul><ul><li>Attributes to be used for prediction </li></ul></ul><ul><ul><li>Algorithm used to build the model </li></ul></ul><ul><li>Populate a mining model from training data </li></ul><ul><li>Predict attributes for new data </li></ul><ul><li>Browse a mining model fro reporting and visualization </li></ul>
  8. 8. DMM As Analogous to A Table in SQL <ul><li>Create a data mining module object </li></ul><ul><ul><li>CREATE MINING MODEL [model_name] </li></ul></ul><ul><li>Insert training data into the model and train it </li></ul><ul><ul><li>INSERT INTO [model_name] </li></ul></ul><ul><li>Use the data mining model </li></ul><ul><ul><li>SELECT relation_name.[id], [model_name].[predict_attr] </li></ul></ul><ul><ul><li>consult DMM content in order to make predictions and browse statistics obtained by the model </li></ul></ul><ul><li>Using DELETE to empty/reset </li></ul><ul><li>Predictions on datasets: prediction join between a model and a data set (tables) </li></ul><ul><li>Deploy DMM by just writing SQL queries! </li></ul>
  9. 9. Two Basic Components <ul><li>Cases/caseset: input data </li></ul><ul><ul><li>A table or nested tables (for hierarchical data) </li></ul></ul><ul><li>Data mining model (DMM): a special type of table </li></ul><ul><ul><li>A caseset is associated with a DMM and meta-info while creating a DMM </li></ul></ul><ul><ul><li>Save mining algorithm and resulting abstraction instead of data itself </li></ul></ul><ul><ul><li>Fundamental operations: CREATE, INSERT INTO, PREDICTION JOIN, SELECT, DELETE FROM, and DROP </li></ul></ul>
  10. 10. Flatterned Representation of Caseset Problem: Lots of replication! Age Prob Age Hair Color Gender Customer ID Customers Car Prob Car Customer ID Car Owernership Product Type Quantity Product Name Customer ID Product Purchases 50% Van Food 6 Ham 100% 35 Black Male 1 50% Van Elec 1 VCR 100% 35 Black Male 1 50% Van Elec 1 TV 100% 35 Black Male 1 100% Car Food 6 Ham 100% 35 Black Male 1 100% Car Elec 1 VCR 100% 35 Black Male 1 100% Car Elec 1 TV 100% 35 Black Male 1 Car prob Car Type Quan Prod Age prob Age Hair Gend CID
  11. 11. Logical Nested Table Representation of Caseset <ul><li>Use Data Shaping Service to generate a hierarchical rowset </li></ul><ul><ul><li>Part of Microsoft Data Access Components (MDAC) products </li></ul></ul>Car Ownership Product Purchases Food 6 Ham 50% Van Elec 1 VCR 100% Car Elec 1 TV 100% 35 Black Male 1 Car prob Car Type Quan Prod Age prob Age Hair Gend CID
  12. 12. More About Nested Table <ul><li>Not necessary for the storage subsystem to support nested records </li></ul><ul><li>Cases are only instantiated as nested rowsets prior to training/predicting data mining models </li></ul><ul><li>Same physical data may be used to generate different casesets </li></ul>
  13. 13. Defining A Data Mining Model <ul><li>The name of the model </li></ul><ul><li>The algorithm and parameters </li></ul><ul><li>The columns of caseset and the relationships among columns </li></ul><ul><li>“ Source columns” and “prediction columns” </li></ul>
  14. 14. Example CREATE MINING MODEL [Age Prediction] %Name of Model ( [Customer ID] LONG KEY, %source column [Gender] TEXT DISCRETE, %source column [Age] Double DISCRETIZED() PREDICT, %prediction column [Product Purchases] TABLE %source column ( [Product Name] TEXT KEY, %source column [Quantity] DOUBLE NORMAL CONTINUOUS, %source column [Product Type] TEXT DISCRETE RELATED TO [Product Name] %source column )) USING [Decision_Trees_101] %Mining algorithm used
  15. 15. Column Specifiers <ul><li>KEY </li></ul><ul><li>ATTRIBUTE </li></ul><ul><li>RELATION (RELATED TO clause) </li></ul><ul><li>QUALIFIER (OF clause) </li></ul><ul><ul><li>PROBABILITY: [0, 1] </li></ul></ul><ul><ul><li>VARIANCE </li></ul></ul><ul><ul><li>SUPPORT </li></ul></ul><ul><ul><li>PROBABILITY-VARIANCE </li></ul></ul><ul><ul><li>ORDER </li></ul></ul><ul><ul><li>TABLE </li></ul></ul>
  16. 16. Attribute Types <ul><li>DISCRETE </li></ul><ul><li>ORDERED </li></ul><ul><li>CYCLICAL </li></ul><ul><li>CONTINOUS </li></ul><ul><li>DISCRETIZED </li></ul><ul><li>SEQUENCE_TIME </li></ul>
  17. 17. Populating A DMM <ul><li>Use INSERT INTO statement </li></ul><ul><li>Consuming a case using the data mining model </li></ul><ul><li>Use SHAPE statement to create the nested table from the input data </li></ul>
  18. 18. Example: Populating a DMM INSERT INTO [Age Prediction] ( [Customer ID], [Gender], [Age], [Product Purchases](SKIP, [Product Name], [Quantity], [Product Type]) ) SHAPE {SELECT [Customer ID], [Gender], [Age] FROM Customers ORDER BY [Customer ID]} APPEND {SELECT [CustID], {product Name], [Quantity], [Product Type] FROM Sales ORDER BY [CustID]} RELATE [Customer ID] TO [CustID] ) AS [Product Purchases]
  19. 19. Using Data Model to Predict <ul><li>Prediction join </li></ul><ul><ul><li>Prediction on dataset D using DMM M </li></ul></ul><ul><ul><li>Different to equi-join </li></ul></ul><ul><li>DMM: a “truth table” </li></ul><ul><li>SELECT statement associated with PREDICTION JOIN specifies values extracted from DMM </li></ul>
  20. 20. Example: Using a DMM in Prediction SELECT t.[Customer ID], [Age Prediction].[Age] FROM [Age Prediction] PRECTION JOIN (SHAPE {SELECT [Customer ID], [Gender] FROM Customers ORDER BY [Customer ID]} APPEND ( {SELECT [CustID], [Product Name], [Quantity] FROM Sales ORDER BY [CustID]} RELATE [Customer ID] TO [CustID] ) AS [Product Purchases] ) AS t ON [Age Prediction].[Gender]=t.[Gender] AND [Age Prediction].[Product Purchases].[Product Name]=t.[Product Purchases].[Product Name] AND [Age Prediction].[Product Purchases].[Quantity]=t.[Product Purchases].[Quantity]
  21. 21. Browsing DMM <ul><li>What is in a DMM? </li></ul><ul><ul><li>Rules, formulas, trees, …, etc </li></ul></ul><ul><li>Browsing DMM </li></ul><ul><ul><li>Visualization </li></ul></ul>
  22. 22. Concluding Remarks <ul><li>OLE DB for DM integrates data mining and database systems </li></ul><ul><ul><li>A good standard for mining application builders </li></ul></ul><ul><li>How can we be involved? </li></ul><ul><ul><li>Provide association/sequential pattern mining modules for OLE DB for DM? </li></ul></ul><ul><ul><li>Design more concrete language primitives? </li></ul></ul><ul><li>References </li></ul><ul><ul><li>http://www.microsoft.com/data.oledb/dm.html </li></ul></ul>
  23. 23. www.cs.uiuc.edu/~hanj <ul><li>Thank you !!! </li></ul>

×