Business Intelligence SSIS Development Terry Thompson <br />Introduction:<br />For phase I of the Business Intelligence project I was responsible for the design and build of a SQL server 2005 database to track employee, client, division, timesheet, employee labor rate data as well as job order information. In the project scenario the data is currently stored and maintained in Excel spreadsheets and CSV files.<br />SQL Server 2005 Integration Services (SSIS) was used to integrate these external data sources into the SQL Server database. <br />Project Goals:<br /><ul><li>Understand the current data source data and how the business maintains the information.
Develop a methodology using SSIS ETL processes to integrate the source data into a SQL 2005 database.
Standard use of comments, annotations and SSIS component naming.
Use of best practices when designing ETL packages – cognizant of the system overhead needed to run each package.
The final SQL 2005 database will be used to source a Analysis Services (SSAS) solution that will support the development of custom data views of the database, and setup of a series of cubes, dimensions and key point indicators(KPIs) to analyze measures. </li></ul>Source Data Descriptions<br /><ul><li>C:SetFocusBISourceDataEmployees.XLS
CSV files that contains all the labor data transactions – the employee id, the work date, number of work hours, and the job number</li></ul>Destination Database Diagram for the categories of files<br /> <br />Overview of SQL Server Integration Services 2005 (SSIS) Package Design<br /><ul><li>Use of Data Flow Task to define data source type
Use of Data Conversion, to ensure the incoming source data columns use the same data type as the corresponding target database columns.
Use of Aggregate transformation for packages that need to “collapse” repetitive data into unique values based on source columns that uniquely identify a row.
Use of Lookup against other table to validate the source data rows before insert or update to make sure that incoming foreign keys are valid.
Use of Conditional splits to evaluate the target key column(s) for NULL values. If so, this means the incoming row should be inserted (using the OLE DB Destination), or updated (using the OLE DB command).
Generate appropriate emails including rows inserted and any errors generated.
Develop master package to execute all ETL packages and scheduled to run nightly at midnight.
Create separate packages to handle nightly database backups and re-index all files and shrink the database.</li></ul> Sample ETL Control Flow<br />Data Flow Task (DFT) establishes the control task to move the source data through the data flow transformation task to the SQL 2005 database destination.<br />Send Mail Task (SMT) generates formatted email using scoped variables that represent the counts of inserted, updated and/or errors found in the data flow transformation tasks. <br />Sample ETL Data Flow<br /><ul><li>Extract, Transformation, and Load (ETL) defines the data source and source file type used in the transformation and load process.
Data conversion is required to reformat the source data columns to match the destination columns.
Derived Columns used to create destination data not provided in source.
Lookup determines if source row is a new row to be inserted or existing row to be updated based on the key columns. Failures are ignored in the Lookup and passed to the Conditional split that evaluates the key columns for Null values. If Null (Lookup failed), the source row will be inserted using the OLE DB Destination. If not null the source will be updated using the OLE DB command.
Package level variables are used to maintain counts of the number of records updated, inserted and/or in error. These variables are used in the Send Mail Task of the Control Flow to generate an email of package status at completion.</li></ul>Sample Master Execution Package<br />The master package bundles of all packages required to populate the SQL 2005 database. Packages are executed in the appropriate sequence to ensure database integrity is maintained. Precedence constraints are used to manage the control flow between packages. On successful completion the following package is executed. If a package encounters an error the master package will stop execution and send email notification. An email is sent upon successful completion of all packages. <br />The SQL 2005 database is backed up and re-indexed in two final packages at the end of all ETL packages. <br />SSIS Project Final Deliverables<br />Solution:<br />SSISStudentProject.sln<br />Shared Data Source<br />All Works DB Student.ds<br />Packages<br /><ul><li>EmployeeMasterPackage.dtsx