Your SlideShare is downloading. ×
SSIS Project Profile
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

SSIS Project Profile

1,145
views

Published on


0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,145
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Business Intelligence SSIS Development Terry Thompson
    Introduction:
    For phase I of the Business Intelligence project I was responsible for the design and build of a SQL server 2005 database to track employee, client, division, timesheet, employee labor rate data as well as job order information. In the project scenario the data is currently stored and maintained in Excel spreadsheets and CSV files.
    SQL Server 2005 Integration Services (SSIS) was used to integrate these external data sources into the SQL Server database.
    Project Goals:
    • Understand the current data source data and how the business maintains the information.
    • 2. Develop a methodology using SSIS ETL processes to integrate the source data into a SQL 2005 database.
    • 3. Standard use of comments, annotations and SSIS component naming.
    • 4. Use of best practices when designing ETL packages – cognizant of the system overhead needed to run each package.
    • 5. Provide adequate error handling.
    • 6. The final SQL 2005 database will be used to source a Analysis Services (SSAS) solution that will support the development of custom data views of the database, and setup of a series of cubes, dimensions and key point indicators(KPIs) to analyze measures.
    Source Data Descriptions
    • C:SetFocusBISourceDataEmployees.XLS
    • 7. First sheet (employees) roster of Employees
    • 8. Second sheet (employee rates) Hourly Rates , along with an effective date
    • 9. C:SetFocusBISourceDataClientGeographies.XLS
    • 10. First sheet (Client Listing) contains each client, along with a CountyKey
    • 11. Second sheet (County Definitions) contains the list of counties
    • 12. Third sheet (Division Definitions) contains the list of Divisions, and the county association
    • 13. Fourth sheet (Special Groupings) groupings for clients. This is to provide another dimension for client aggregation.
    • 14. C:SetFocusBISourceDataProjectMaster.xls
    • 15. First sheet (Project Master) contains one row for each Job work order. It contains a reference to the client, job closed status, and the creation date.
    • 16. C:SetFocusBISourceData imeEmptimeXXXX.CSV
    • 17. CSV files that contains all the labor data transactions – the employee id, the work date, number of work hours, and the job number
    Destination Database Diagram for the categories of files

    Overview of SQL Server Integration Services 2005 (SSIS) Package Design
    • Use of Data Flow Task to define data source type
    • 18. Use of Data Conversion, to ensure the incoming source data columns use the same data type as the corresponding target database columns.
    • 19. Use of Aggregate transformation for packages that need to “collapse” repetitive data into unique values based on source columns that uniquely identify a row.
    • 20. Use of Lookup against other table to validate the source data rows before insert or update to make sure that incoming foreign keys are valid.
    • 21. Use of Conditional splits to evaluate the target key column(s) for NULL values. If so, this means the incoming row should be inserted (using the OLE DB Destination), or updated (using the OLE DB command).
    • 22. Generate appropriate emails including rows inserted and any errors generated.
    • 23. Develop master package to execute all ETL packages and scheduled to run nightly at midnight.
    • 24. Create separate packages to handle nightly database backups and re-index all files and shrink the database.
    Sample ETL Control Flow
    Data Flow Task (DFT) establishes the control task to move the source data through the data flow transformation task to the SQL 2005 database destination.
    Send Mail Task (SMT) generates formatted email using scoped variables that represent the counts of inserted, updated and/or errors found in the data flow transformation tasks.
    Sample ETL Data Flow
    • Extract, Transformation, and Load (ETL) defines the data source and source file type used in the transformation and load process.
    • 25. Data conversion is required to reformat the source data columns to match the destination columns.
    • 26. Derived Columns used to create destination data not provided in source.
    • 27. Lookup determines if source row is a new row to be inserted or existing row to be updated based on the key columns. Failures are ignored in the Lookup and passed to the Conditional split that evaluates the key columns for Null values. If Null (Lookup failed), the source row will be inserted using the OLE DB Destination. If not null the source will be updated using the OLE DB command.
    • 28. Package level variables are used to maintain counts of the number of records updated, inserted and/or in error. These variables are used in the Send Mail Task of the Control Flow to generate an email of package status at completion.
    Sample Master Execution Package
    The master package bundles of all packages required to populate the SQL 2005 database. Packages are executed in the appropriate sequence to ensure database integrity is maintained. Precedence constraints are used to manage the control flow between packages. On successful completion the following package is executed. If a package encounters an error the master package will stop execution and send email notification. An email is sent upon successful completion of all packages.
    The SQL 2005 database is backed up and re-indexed in two final packages at the end of all ETL packages.
    SSIS Project Final Deliverables
    Solution:
    SSISStudentProject.sln
    Shared Data Source
    All Works DB Student.ds
    Packages
    • EmployeeMasterPackage.dtsx
    • 29. EmployeeRatePackage.dtsx
    • 30. ClientMasterPackage.dtsx
    • 31. ClientMasterGroupingPackage.dtsx
    • 32. DivisionMasterPackage.dtsx
    • 33. ClientGroupingsXRefPackage.dtsx
    • 34. ProjectJobMasterPackage.dtsx
    • 35. ProjectJobTimeSheetPackage.dtsx
    • 36. AllWorksDBStudentBackup.dtsx
    • 37. ReindexandCompressDB.dtsx
    • 38. AllWorksDBStudentMasterPackage.dtsx
    All packages deployed to SQL Server via dtsinstall.exe
    Master Package scheduled to run via SQL Server Agent