[ ] extending SQL server


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

[ ] extending SQL server

  1. 1. Ryan Ebert Richard Rogers Nick Crampton CS 453 Final Project Chapter 17 Due: 6-3-08 Extending SQL server Data Mining One of the problems with a data mining tool is that no matter how many features and tools or how good a tool is, the tool is useless the second you require features of algorithms not featured by the tool. Microsoft SQL Server addresses this problem four different ways. Using stored procedures to add business logic or enhanced intelligence on top of SQL Server Data Mining, using the Visual Studio extensibility mechanisms to extend and enhance the data mining tools, developing plug-in algorithms that extend the algorithm set available on the server, and writing new viewers to visualize data mining models. SQL Server 2005 allows for third party companies to create and implement plug-in algorithms to run inside their Analysis Services. A plug-in algorithm is a COM (Component Object Model) object that implements and understands a set of interfaces specified by the SQK Server Data Mining Team. From an end-ser point of view, they are the plug-in algorithm works the same way as a built in algorithm. When creating a plug-in algorithm, you need to have two distinct objects. One object is an algorithm factory. An algorithm factory creates algorithm objects and exposes the capabilities of the algorithm through the metadata interface. The other part to the plug-in algorithm is the objects themselves. Analysis Services creates one instance of an algorithm object for each mining model using the algorithm. This object is responsible
  2. 2. for handling processing, prediction, and content navigation requests, and contains the in- memory representation of the learned content of the mining model. The data stream provided to the algorithm is a stream of attribute value pairs representing a set of cases. The stream is called a Case Set. It is important when implementing an algorithm that you keep in mind the capabilities of your algorithm and accurately represent your functionality by exposing the correct algorithm metadata. Model creation from the algorithm point of view begins when Analysis Services calls the algorithm factory’s interface to validate the model structure against the metadata exposed by the algorithm. When a process command is sent to the mining model, the algorithm factory is called to instantiate a new algorithm object. It is important to know that algorithm processing begins when Analysis Services calls the algorithms method with an execution context and the case set. The Analysis Services data mining architecture allows for an unlimited number of attributes and an unlimited number of discrete states per attribute. The plug-in architecture allows for algorithm-specific prediction methods in custom interfaces. The main Predict method is used for predicting s specific attribute value, retrieving statistics about a predicted value, retrieving statistics about all possible predicted values, and to predict sets of attributes. Analyses Services parses DMX statements and prepares input cases in a manner similar to training. SQL Server Data Mining exposes learned content as a parent-child nested row-set. Each row represents a node in the model’s content and specifies, the node unique ID, parent OD, node type and a set of distributions in a nested table. It is up to the end user to decide how their algorithm represents the content.
  3. 3. You can wrap the COM interfaces with .NET and write managed plug-ins in languages like C# and Visual Basic.NET. To install a plug-in algorithm, the COM needs to be registered and update the server’s .ini file. Another way to extend SQL Server Data Mining is to add data mining visualizations. The visualization architecture in the Data Mining Designer allows you to add views for any or all algorithms. This chapter covered the basic concepts behind SQL Server Data Mining. All information was taken from Data Mining with SQL Server2005 by ZhaoHui Tang and Janie MacLennan, copyright 2005.