This document provides an overview of SQL Server Integration Services (SSIS) including its lifecycle, performance considerations, and deployment. It discusses SSIS components, buffers and memory usage, optimization strategies using the OVAL method, measuring performance, and manageability features like logging, configurations, and checkpoints. deployment process and tools are also outlined. The presentation does not provide prescriptive guidance for specific situations.
3. Objectives and Takeaways
A high level view
Design considerations
How to measure performance
Performance implications of architecture
Manageability aspects of SSIS
Deployment tips
Out of scope
Prescriptive guidance for specific situations
4. Agenda
Quick Introduction
Understanding Buffers and Memory
OVAL Concept Detailed
Component Specific Notes
Manageability Features
Deployment Considerations
6. SSIS Life Cycle tools
Design the SSIS Package
Business Intelligence Studio (visual Studio)
Migration wizard for pre SQL 2005 packages
Version Control Integration (VSS)
Deployment/Execution
Deployment Utility to copy packages
Command Line execution (dtexec.exe and dtexecui.exe)
Flexible Configuration Options
Supportability
Rich per package Logging
SQL Management Studio for monitoring running packages and
organizing stored packages
Checkpoint - Restartability
8. Buffers and Memory
Buffers based on design time metadata
The width of a row determines the size of the buffer
Smaller rows = more rows in memory = greater
efficiency
Memory copies are expensive!
A buffer might have placeholder columns filled by
downstream components
Pointer magic where possible
9. Component Types
Row based Logically works at a row level
(synchronous Buffer Reused
outputs) Data Convert, Derived Column
Partially May logically work at a row level
Blocking
(asynchronous Data copied to new buffers
outputs) Merge, Merge Join, Union All
Needs all input buffers before
Blocking producing any output rows
(asynchronous
outputs) Data copied to new buffers
Aggregate, Sort
10. CPU Utilization
Execution Tree
Starts from a source or an
async output
Ends at a destination or an
input that has no sync
outputs
Each Execution Tree can
get a worker thread
MaxEngineThreads to
control parallelism
11. Performance Strategy
Use OVAL to identify the factors affecting data
integration performance…
Operations What logic should be applied to the data?
Volume How much data must be processed?
Which app is best suited to these operations
Application on this volume of data? For example, use SQL
Server or SSIS for sorting data?
Where should the app run? For example, on a
Location shared server, or on a standalone machine?
12. An OVAL Example—
Loading a Text File
Simple scenario…
Text file on Server 1 SQL Server on Server 2
Interesting performance considerations!
13. Operations
Understand all operations performed
1. Open a transaction on SQL Server
2. Read data from the text file
3. Load data into the SSIS data flow
4. Load the data into SQL Server
5. Commit the transaction
Beware of hidden operations
Data conversion in either step 3 or 4
14. Volume
Reduce where possible
Don’t push unneeded columns
Conditional split for filtering rows
Do not parse or convert columns unnecessarily
In a fixed-width format you can combine adjacent
unneeded columns into one
Leave unneeded columns as strings
15. Application
Is SSIS right for this?
Overhead of starting up an SSIS package
may offset any performance gain over BCP
for small data sets.
Is BCP good enough?
Is the greater manageability and control of
SSIS needed?
Bulk Import Task vs. Data Flow
16. Location
Consider the following configuration …
Text file on Server 1 SQL Server on Server 2
Where should SSIS run?
(Licensing issues aside)
17. Measuring Performance
OVAL does not provide prescriptive guidance
Too many variables
Improve performance by applying OVAL and
measuring
SSIS Logging
Performance counters
SQL Server Profiler
For extract queries, lookups and loading
18. Parallelism
Focus on critical path
Utilize available resources
Memory Constrained Reader and CPU Constrained
Let it rip! Optimize the slowest
20. Manageability Features
Logging and Log Providers
Checkpoint Restartability
Precedence Constraints
Configurations
SSIS Service
21. Checkpointing
Package Loads Checkpoint File Created
Data Flow Task Write Checkpoint
Data Flow Task Write Checkpoint
Send Mail Task Write Checkpoint
Package Completes Checkpoint File deleted
22. Configuration Scenario
Package Handoff
Machines where
packages are
being designed
Dev
/tested
Test Production
/executed
Configuration updates
package on load with DB Multiple
locations (and mail Configurations
server, file share
locations….)
Prod DB
Dev DB Test DB
23. Precedence constraints
Directs Flow from object to object…
Basically, ‘when do I move on’
Success, Failure, Completion or one of those plus
an expression (condition)
Dataflow Task
Failure
Success
Completion Success &
expression
SendMail Task
25. Deployment •Design Package
•Add Configurations Bi Studio
Flow •Add Miscellaneous files
•Set Project Deployment properties
•Build
Tools to organize
•Copy/Move Deployment folderfiles User
and ‘copy’
packages and
•Choose Destination (SQL File System)
supporting files •Modify protection level
•Choose location of supporting files
•Change configurations
•Execute Installation Wizard
•Copy/Move Deployment folderfiles User
•Create desired agent jobs SQL Agent
26. SQL Management Studio
Utilizes the SSIS service
Allows Monitoring of currently Executing packages
Maintain stored package structure
Ad hoc Package execution
28. SSIS: Summary
Fast !
Data flows process large volumes of data efficiently - even through complex
operations
Exceptional price / performance on multi-core
Feature Rich
Many pre-built adapters and transformations reduce hand coding
Extensible object model enables specialized custom or scripted components
Highly productive visual environment speeds development and debugging
Integral part of a complete BI stack (IS-AS-RS)
Beyond ETL
Enables integration of XML, RSS and Web Services data
Data cleansing features enable “difficult” data to be handled during loading
Data and Text mining allow “smart” handling of data for imputation of
incomplete data, conditional processing of potential problems, or smart
escalation of issues such as fraud detection