4. Objective
By the end of the presentation you will understand:
The basic concept of SSIS including the control flow and
data flow.
Performing data mining – related transformations and
tasks in SSIS
The text mining solution based on Term Extraction and
Term Look Up transformations
4
5. Agenda
Overview of SSIS.
Working with SSIS in Data Mining
Text Mining Transformations
5
13. SSIS Tasks # 1
13
Task Description
Bulk Insert Load large amounts of data from a text in to SQL
Server table
Data Flow Supports the copying a transformation of data
between heterogeneous data source.
Execute Package Run Sub-package.
Execute Process Run a program or a batch file as part of a package.
Execute SQL Run SQL statements during package execution and
optionally saves the results of those query.
File System Task Performs file system operations.
File Transfer Protocol Downloads data file from a remote server or an
internet location as part of a package workflow.
14. SSIS Tasks # 2
14
Task Description
Message Queue Use massage queuing to send and receive message
between SSIS packages.
Script Use a script to perform functions that are not
available in prebuilt SSIS tasks. The script tasks
enables you to write script in Visual Basic .NET and
C# using Microsoft VSTA environment.
Send Mail Sends an e-mail message.
XML Merges, Filters and transforms data in XML
documents.
Data Profiling Analyzes (and maintains) the data quality. Provide
column value distributions and statistics.
15. SSIS Transformations #1
15
Transformation Description
Aggregate Performs aggregations (Such as average, sum)
Character Map Applies string functions to character data.
Condition Split Routes data rows to different outputs based on
specified criteria.
Copy Column Adds copies of input columns to the transformation
output.
Data Conversation Converts the data type of a column to a different
data type.
Derived Column Generate news column that derive from existing
columns using expressions.
Dimension Processing
Destination
Process Analysis Service Dimensions.
16. SSIS Transformations #2
16
Transformation Description
Fuzzy Grouping Perform data-cleansing tasks by indentifying rows
of data that are likely to be duplicates and by
choosing a canonical row of data to use in
standardizing the data.
Fuzzy Lookup Looks up values in a reference table using fuzzy
match.
Lookup Looks up values in a reference table using exact
match.
Merge Merges two sorts data sets.
Merge Join Joins two sorted data sets using a FULL, LEFT or
INNER join.
17. SSIS Transformations #3
17
Transformation Description
Multicast Distributes data sets to multiple outputs.
Partition Processing
Destination
Process Analysis Services partitions.
Pivot Creates a less normalized version of a normalized
table.
Sort Sorts pipeline data.
Union All Creates a union of multiple data sets.
UnPivot Create a more normalized version of non-
normalized tables.
19. Data Mining in SSIS Environment
19
SSIS provides a flow environment for data
extraction, loading, and transformation.
Can use to load data from various sources, join
them together, normalize column values, remove
dirty records, replace missing values, split data
into training and testing datasets, and so on.
20. Tasks and Transforms for Data Mining
20
Data Mining Query Task1
Analysis Services Execute DDL Task3
Data Mining Query Transformation
Data Mining Model Training Transformation
Analysis Services Processing Task
Term Extraction Transformation
Term Lookup Transformation
23. Text Mining Transformations
This section will focus on
Term Extraction Transformation
Term Look Up Transformation
23
24. Model types to perform data mining
Classification Model
Use the key words and phases nested table as input to
predict the class of a document.
Clustering Model
Find similar documents based on common occurrences.
Association Model
Detect cross-correlations between keys word and
phases.
24
25. Process of text mining
Term Extraction Transformation
Build the dictionary of keys words and phases over a
collection of representative documents.
Term Look Up Transformation
Based on the dictionary, extract the list of significant key
word and phases for each document to be analyzed.
Training mining
Training mining models on top the transformed data.
25