Final Project Chapter 17
Extending SQL server Data Mining
One of the problems with a data mining tool is that no matter how many features
and tools or how good a tool is, the tool is useless the second you require features of
algorithms not featured by the tool. Microsoft SQL Server addresses this problem four
different ways. Using stored procedures to add business logic or enhanced intelligence
on top of SQL Server Data Mining, using the Visual Studio extensibility mechanisms to
extend and enhance the data mining tools, developing plug-in algorithms that extend the
algorithm set available on the server, and writing new viewers to visualize data mining
SQL Server 2005 allows for third party companies to create and implement plug-in
algorithms to run inside their Analysis Services. A plug-in algorithm is a COM
(Component Object Model) object that implements and understands a set of interfaces
specified by the SQK Server Data Mining Team. From an end-ser point of view, they are
the plug-in algorithm works the same way as a built in algorithm.
When creating a plug-in algorithm, you need to have two distinct objects. One object
is an algorithm factory. An algorithm factory creates algorithm objects and exposes the
capabilities of the algorithm through the metadata interface. The other part to the plug-in
algorithm is the objects themselves. Analysis Services creates one instance of an
algorithm object for each mining model using the algorithm. This object is responsible
for handling processing, prediction, and content navigation requests, and contains the in-
memory representation of the learned content of the mining model.
The data stream provided to the algorithm is a stream of attribute value pairs
representing a set of cases. The stream is called a Case Set. It is important when
implementing an algorithm that you keep in mind the capabilities of your algorithm and
accurately represent your functionality by exposing the correct algorithm metadata.
Model creation from the algorithm point of view begins when Analysis Services calls
the algorithm factory’s interface to validate the model structure against the metadata
exposed by the algorithm. When a process command is sent to the mining model, the
algorithm factory is called to instantiate a new algorithm object. It is important to know
that algorithm processing begins when Analysis Services calls the algorithms method
with an execution context and the case set. The Analysis Services data mining
architecture allows for an unlimited number of attributes and an unlimited number of
discrete states per attribute.
The plug-in architecture allows for algorithm-specific prediction methods in custom
interfaces. The main Predict method is used for predicting s specific attribute value,
retrieving statistics about a predicted value, retrieving statistics about all possible
predicted values, and to predict sets of attributes. Analyses Services parses DMX
statements and prepares input cases in a manner similar to training.
SQL Server Data Mining exposes learned content as a parent-child nested row-set.
Each row represents a node in the model’s content and specifies, the node unique ID,
parent OD, node type and a set of distributions in a nested table. It is up to the end user
to decide how their algorithm represents the content.
You can wrap the COM interfaces with .NET and write managed plug-ins in
languages like C# and Visual Basic.NET. To install a plug-in algorithm, the COM needs
to be registered and update the server’s .ini file. Another way to extend SQL Server Data
Mining is to add data mining visualizations. The visualization architecture in the Data
Mining Designer allows you to add views for any or all algorithms. This chapter covered
the basic concepts behind SQL Server Data Mining. All information was taken from
Data Mining with SQL Server2005 by ZhaoHui Tang and Janie MacLennan, copyright