Exploring Scalability, Performance And Deployment

Vinod Kumar M
Technology Evangelist – DB and BI
Microsoft
www.ExtremeExperts.com

Objectives and Takeaways
A high level view
Design considerations
How to measure performance
Performance implications of architecture
Manageability aspects of SSIS
Deployment tips

Out of scope
Prescriptive guidance for specific situations

Agenda
Quick Introduction
Understanding Buffers and Memory
OVAL Concept Detailed
Component Specific Notes
Manageability Features
Deployment Considerations

SSIS Life Cycle tools
Design the SSIS Package
Business Intelligence Studio (visual Studio)
Migration wizard for pre SQL 2005 packages
Version Control Integration (VSS)
Deployment/Execution
Deployment Utility to copy packages
Command Line execution (dtexec.exe and dtexecui.exe)
Flexible Configuration Options
Supportability
Rich per package Logging
SQL Management Studio for monitoring running packages and
organizing stored packages
Checkpoint - Restartability

Buffers and Memory
Buffers based on design time metadata
The width of a row determines the size of the buffer
Smaller rows = more rows in memory = greater
efficiency
Memory copies are expensive!
A buffer might have placeholder columns filled by
downstream components
Pointer magic where possible

Component Types
Row based Logically works at a row level
(synchronous Buffer Reused
outputs) Data Convert, Derived Column

Partially May logically work at a row level
Blocking
(asynchronous Data copied to new buffers
outputs) Merge, Merge Join, Union All

Needs all input buffers before
Blocking producing any output rows
(asynchronous
outputs) Data copied to new buffers
Aggregate, Sort

CPU Utilization
Execution Tree
Starts from a source or an
async output
Ends at a destination or an
input that has no sync
outputs
Each Execution Tree can
get a worker thread
MaxEngineThreads to
control parallelism

Performance Strategy
Use OVAL to identify the factors affecting data
integration performance…

Operations What logic should be applied to the data?

Volume How much data must be processed?

Which app is best suited to these operations
Application on this volume of data? For example, use SQL
Server or SSIS for sorting data?

Where should the app run? For example, on a
Location shared server, or on a standalone machine?

An OVAL Example—
Loading a Text File
Simple scenario…

Text file on Server 1 SQL Server on Server 2

Interesting performance considerations!

Operations
Understand all operations performed

1. Open a transaction on SQL Server
2. Read data from the text file
3. Load data into the SSIS data flow
4. Load the data into SQL Server
5. Commit the transaction

Beware of hidden operations
Data conversion in either step 3 or 4

Volume
Reduce where possible
Don’t push unneeded columns
Conditional split for filtering rows
Do not parse or convert columns unnecessarily
In a fixed-width format you can combine adjacent
unneeded columns into one
Leave unneeded columns as strings

Application
Is SSIS right for this?
Overhead of starting up an SSIS package
may offset any performance gain over BCP
for small data sets.
Is BCP good enough?
Is the greater manageability and control of
SSIS needed?
Bulk Import Task vs. Data Flow

Location
Consider the following configuration …

Text file on Server 1 SQL Server on Server 2

Where should SSIS run?
(Licensing issues aside)

Measuring Performance
OVAL does not provide prescriptive guidance
Too many variables
Improve performance by applying OVAL and
measuring
SSIS Logging
Performance counters
SQL Server Profiler
For extract queries, lookups and loading

Parallelism
Focus on critical path
Utilize available resources
Memory Constrained Reader and CPU Constrained

Let it rip! Optimize the slowest

Manageability Features
Logging and Log Providers
Checkpoint Restartability
Precedence Constraints
Configurations
SSIS Service

Checkpointing
Package Loads Checkpoint File Created

Data Flow Task Write Checkpoint

Data Flow Task Write Checkpoint

Send Mail Task Write Checkpoint

Package Completes Checkpoint File deleted

Configuration Scenario
Package Handoff
Machines where
packages are
being designed
Dev
/tested
Test Production
/executed

Configuration updates
package on load with DB Multiple
locations (and mail Configurations
server, file share
locations….)

Prod DB
Dev DB Test DB

Precedence constraints
Directs Flow from object to object…
Basically, ‘when do I move on’
Success, Failure, Completion or one of those plus
an expression (condition)

Dataflow Task

Failure
Success
Completion Success &
expression

SendMail Task

Deployment •Design Package
•Add Configurations Bi Studio
Flow •Add Miscellaneous files
•Set Project Deployment properties
•Build

Tools to organize
•Copy/Move Deployment folderfiles User
and ‘copy’
packages and
•Choose Destination (SQL File System)
supporting files •Modify protection level
•Choose location of supporting files
•Change configurations
•Execute Installation Wizard

•Copy/Move Deployment folderfiles User

•Create desired agent jobs SQL Agent

SQL Management Studio

Utilizes the SSIS service
Allows Monitoring of currently Executing packages
Maintain stored package structure
Ad hoc Package execution

SSIS: Summary
Fast !
Data flows process large volumes of data efficiently - even through complex
operations
Exceptional price / performance on multi-core
Feature Rich
Many pre-built adapters and transformations reduce hand coding
Extensible object model enables specialized custom or scripted components
Highly productive visual environment speeds development and debugging
Integral part of a complete BI stack (IS-AS-RS)
Beyond ETL
Enables integration of XML, RSS and Web Services data
Data cleansing features enable “difficult” data to be handled during loading
Data and Text mining allow “smart” handling of data for imputation of
incomplete data, conditional processing of potential problems, or smart
escalation of issues such as fraud detection

Your Feedback
is Important!
Please Fill Out the feedback form

© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should
not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Exploring Scalability, Performance And Deployment

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Exploring Scalability, Performance And Deployment

Similar to Exploring Scalability, Performance And Deployment (20)

More from rsnarayanan

More from rsnarayanan (20)

Recently uploaded

Recently uploaded (20)

Exploring Scalability, Performance And Deployment

Editor's Notes