Overview Standard Tasks in SSIS SSIS Packages Data Flow Working with SSIS in Data Mining Data Mining Transformations Text Mining Transformations Summary
Overview of SSIS SQL Server Integration Services (SSIS) is a component of the Microsoft SQL Server database software which can be used to perform a broad range of data migration tasks. SSIS is a platform for data integration and workflow applications. It features a fast and flexible data warehousing tool used for data extraction, transformation, and loading (ETL). The tool may also be used to automate maintenance of SQL Server databases and updates to multidimensional cube data.
SSIS Packages A package is the basic deployment and execution unit of an SSIS project. An SSIS package is the container for SSIS flows. You can create an SSIS package by right-clicking the SSIS Package folder in the Integration Services project folder and selecting the New SSIS Package menu item. An SSIS project may contain multiple packages. A package contains only one control flow, which may contain one or more data flows. In addition to control flow and data flow, a package contains SSIS connections and package variables.
Task Flow and Containers Tasks are listed in the SSIS Toolbox. You can add a task to the package by dragging it from the Toolbox and dropping it into the package designer. A package usually contains multiple tasks in a task flow. Multiple tasks are organized in sequential order with precedence constraints. Containers are SSIS objects that provide structure to a package. Each package has a container, which stores the flows of a package.
To save the updated package, click Save Selected Items on the File menu.
Working with SSIS in Data Mining This powerful tool is used to load data from various sources, combine these data sources, normalize column values, remove dirty records, replace missing values, split data into training and testing data sets, and so on. SSIS is more than just an ETL tool for data mining as it actually provides a few built-in data mining components in the control flow and data flow environment.
Data Mining Transformations The data flow components can be categorized in three large groups, depending on their position in the data flow:
Text Mining Transformations you must first bring the text to some form that can be consumed by the algorithms, to perform text mining with SQL Server Data Mining. The solution included in the product is to represent each piece of text as a collection of words and phrases.
Text Mining Transformations After each document is represented as a collection of key phrases, you can perform data mining using one of the following model types:
Classification models that use the key words and phrases nested table as input to predict the class of a document
Clustering models that find similar documents based on common occurrences
Association models that detect cross-correlations between key words and phrases
Text Mining Transformations The process of text mining usually consists of at least the following three phases: 1. Extraction transformation: Build a dictionary of key words and phrases over a collection of representative documents. 2. Lookup transformation: Based on the dictionary, extract the list of significant key words and phrases for each document to be analyzed. 3. Train mining models on top of the transformed data.
Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net