SSIS Project Profile


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

SSIS Project Profile

  1. 1. Business Intelligence SSIS Development Terry Thompson <br />Introduction:<br />For phase I of the Business Intelligence project I was responsible for the design and build of a SQL server 2005 database to track employee, client, division, timesheet, employee labor rate data as well as job order information. In the project scenario the data is currently stored and maintained in Excel spreadsheets and CSV files.<br />SQL Server 2005 Integration Services (SSIS) was used to integrate these external data sources into the SQL Server database. <br />Project Goals:<br /><ul><li>Understand the current data source data and how the business maintains the information.
  2. 2. Develop a methodology using SSIS ETL processes to integrate the source data into a SQL 2005 database.
  3. 3. Standard use of comments, annotations and SSIS component naming.
  4. 4. Use of best practices when designing ETL packages – cognizant of the system overhead needed to run each package.
  5. 5. Provide adequate error handling.
  6. 6. The final SQL 2005 database will be used to source a Analysis Services (SSAS) solution that will support the development of custom data views of the database, and setup of a series of cubes, dimensions and key point indicators(KPIs) to analyze measures. </li></ul>Source Data Descriptions<br /><ul><li>C:SetFocusBISourceDataEmployees.XLS
  7. 7. First sheet (employees) roster of Employees
  8. 8. Second sheet (employee rates) Hourly Rates , along with an effective date
  9. 9. C:SetFocusBISourceDataClientGeographies.XLS
  10. 10. First sheet (Client Listing) contains each client, along with a CountyKey
  11. 11. Second sheet (County Definitions) contains the list of counties
  12. 12. Third sheet (Division Definitions) contains the list of Divisions, and the county association
  13. 13. Fourth sheet (Special Groupings) groupings for clients. This is to provide another dimension for client aggregation.
  14. 14. C:SetFocusBISourceDataProjectMaster.xls
  15. 15. First sheet (Project Master) contains one row for each Job work order. It contains a reference to the client, job closed status, and the creation date.
  16. 16. C:SetFocusBISourceData imeEmptimeXXXX.CSV
  17. 17. CSV files that contains all the labor data transactions – the employee id, the work date, number of work hours, and the job number</li></ul>Destination Database Diagram for the categories of files<br /> <br />Overview of SQL Server Integration Services 2005 (SSIS) Package Design<br /><ul><li>Use of Data Flow Task to define data source type
  18. 18. Use of Data Conversion, to ensure the incoming source data columns use the same data type as the corresponding target database columns.
  19. 19. Use of Aggregate transformation for packages that need to “collapse” repetitive data into unique values based on source columns that uniquely identify a row.
  20. 20. Use of Lookup against other table to validate the source data rows before insert or update to make sure that incoming foreign keys are valid.
  21. 21. Use of Conditional splits to evaluate the target key column(s) for NULL values. If so, this means the incoming row should be inserted (using the OLE DB Destination), or updated (using the OLE DB command).
  22. 22. Generate appropriate emails including rows inserted and any errors generated.
  23. 23. Develop master package to execute all ETL packages and scheduled to run nightly at midnight.
  24. 24. Create separate packages to handle nightly database backups and re-index all files and shrink the database.</li></ul> Sample ETL Control Flow<br />Data Flow Task (DFT) establishes the control task to move the source data through the data flow transformation task to the SQL 2005 database destination.<br />Send Mail Task (SMT) generates formatted email using scoped variables that represent the counts of inserted, updated and/or errors found in the data flow transformation tasks. <br />Sample ETL Data Flow<br /><ul><li>Extract, Transformation, and Load (ETL) defines the data source and source file type used in the transformation and load process.
  25. 25. Data conversion is required to reformat the source data columns to match the destination columns.
  26. 26. Derived Columns used to create destination data not provided in source.
  27. 27. Lookup determines if source row is a new row to be inserted or existing row to be updated based on the key columns. Failures are ignored in the Lookup and passed to the Conditional split that evaluates the key columns for Null values. If Null (Lookup failed), the source row will be inserted using the OLE DB Destination. If not null the source will be updated using the OLE DB command.
  28. 28. Package level variables are used to maintain counts of the number of records updated, inserted and/or in error. These variables are used in the Send Mail Task of the Control Flow to generate an email of package status at completion.</li></ul>Sample Master Execution Package<br />The master package bundles of all packages required to populate the SQL 2005 database. Packages are executed in the appropriate sequence to ensure database integrity is maintained. Precedence constraints are used to manage the control flow between packages. On successful completion the following package is executed. If a package encounters an error the master package will stop execution and send email notification. An email is sent upon successful completion of all packages. <br />The SQL 2005 database is backed up and re-indexed in two final packages at the end of all ETL packages. <br />SSIS Project Final Deliverables<br />Solution:<br />SSISStudentProject.sln<br />Shared Data Source<br />All Works DB Student.ds<br />Packages<br /><ul><li>EmployeeMasterPackage.dtsx
  29. 29. EmployeeRatePackage.dtsx
  30. 30. ClientMasterPackage.dtsx
  31. 31. ClientMasterGroupingPackage.dtsx
  32. 32. DivisionMasterPackage.dtsx
  33. 33. ClientGroupingsXRefPackage.dtsx
  34. 34. ProjectJobMasterPackage.dtsx
  35. 35. ProjectJobTimeSheetPackage.dtsx
  36. 36. AllWorksDBStudentBackup.dtsx
  37. 37. ReindexandCompressDB.dtsx
  38. 38. AllWorksDBStudentMasterPackage.dtsx</li></ul>All packages deployed to SQL Server via dtsinstall.exe<br />Master Package scheduled to run via SQL Server Agent<br /> <br />