David Weston SSIS PortfolioPresentation Transcript
Name: David Weston
Phone: (617) 692-0608
Business Intelligence Portfolio SQL Server Integration Services (SSIS)
Table of Contents Chapter Slide Number Overview 3 Data Model 4 Sample Package - Timesheet 5 Sample Package – Timesheet: Control Flow 6 Sample Package – Timesheet: Data Flow 7 Master Package 10 Database Maintenance Package 11 SQL Server Agent Job 12
AllWorks currently uses spreadsheets and Oracle data (exported as XML) as part of their systems. They store Employee and Client geography data, along with overhead and job order master data, in spreadsheets. The feed of material purchases comes from an XML file. And finally, the timesheet data comes from csv files
Based on the file structures of these spreadsheets and xml files, create a normalized (3NF) OLTP database to hold all the data in the source files. Create an SSIS package for each of the tables in the data model, to read in the source files, validate, and load the data. Create a master package to run all the individual packages in the appropriate order. Then create a database maintenance package to backup, shrink and re-index the database after each load. Finally, create and schedule SQL Agent job to run the entire package nightly. Initial run will populate the tables for the first time, and updates will occur nightly.
Part 1: Create the Data Model Note: AllWorks currently allows for only one work order project per invoice. They want to allow a single invoice to cover multiple projects. Additionally, when AllWorks receives payment, they want to track how much was received for each project on each invoice. So with a many-to-many relationship between Projects and Invoices it was necessary to create a cross-reference (or bridge) table between the Project and Invoice tables.
Part 2: Create SSIS packages Create an SSIS package to load each of the above tables (with the exception of MaterialType which is static and currently contains only three valid material types). Data must be validated as it is loaded and invalid data should be written out to log files. Success and failure emails should be sent upon the completion of each package. The success emails should contain the number of rows updated, the number of rows inserted, and the number of invalid rows. Sample Package Timesheet The Timesheet package should loop through all the timesheets in the /time folder loading all the data into the Timesheet table. Once the files are processed they should be moved to the /time/processed folder.
Timesheet Package: Control Flow The first step that executes, cleans up any old log files that may remain from prior runs. Using a ForEach Loop Container, loops through all timesheet files (files have naming convention EmpTime*.csv) and process them one at a time. Perform script task to accumulate inserted/updated/invalid row totals for all files processed. Send failure/success email with row counts.
Timesheet Package: Data Flow
Timesheet Package: Data Flow The first steps of the data flow is to read in the source file, convert the data to SQL format, then perform a Lookup Task to determine if the records are updates or inserts. Update Logic: If the source record corresponds to an existing row, use a Conditional Split task to verify that the record contains changed data (we want to avoid unnecessary updates). Perform a RowCount task to update the UpdatedRows variable (used to accumulate total row counts in control flow). Finally perform an OLE DB Task to update the EmpTime table.
Timesheet Package: Data Flow Update Logic: When the incoming records are inserts, validation must be performed on the incoming records. First a Lookup Task will be performed to validate that the ProjectID is valid, then the EmployeeID must also be validated. The final validation ensures that the WorkDate on the timesheet is prior to the Close Date of the Project and that the Project Closed Flag has not been set to true. Once validated, count the records to be inserted (and again assign to a variable to be tallied in the control flow) before inserting to the EmpTime table. Invalid records are sent to a log file for inspection.
Master Package The master package pictured to the left, uses a sequence task to execute each individual package in order, based on data dependencies. Those packages that can run simultaneously have no dependency. When the sequence completes, it executes the Database Maintenance package.
Timesheet Package: Database Maintenance Package The final package to execute is the Database Maintenance package. This performs routine cleanup to the database after loading a large number of records. It shrinks the database, rebuilds indexes, updates statistics, and backs up the database. This will ensure that the indexes and internal statistics of the database are most up to date when SQL Server determines execution plans in the future.
SQL Server Agent Job Finally: All packages were deployed to SQL Server and a job was created using SQL Server Agent to run the Master Package daily at noon.