What's new in SQL Server Integration Services 2012?
WHAT’S NEW IN SQL SERVERINTEGRATION SERVICES 2012?Nico JacobsNico@U2U.be@sqlwaldorf
WHAT’S SSIS?• E xtract from source systems • SQL Server, Oracle, DB2, flat file, xml, Excel, …• T ransform data • Lookup surrogate keys, clean data, reformat, …• L oad it into a destination database • Transactions, checkpoints, scalability, …
WHAT’S SSIS• Data flow reads data from source(s)• Data is pushed in a row-based pipeline• It optionally passes through one or more preprogrammed or ad-hoc transformations • Streaming transformations improve scalability• Destination(s) write data to disk, db, …• Control flow dictates in which order tasks execute, data flow is one of these tasks
WHAT’S NEW IN 2012?• A lot!• New stuff for package developers• New stuff for package administration• New stuff for package usage• Let’s get started!
1: GUI IMPROVEMENTS• Getting started window• Package visualization• Zoom• Undo• SSIS toolbox• Data flow source/destination wizard• Sort packages by name• Grouping in data flow
CHANGE DATA CAPTURE• Incremental load loads all rows that have changed since the last load• How do we know what has changed? • Compare every source row with every destination row • Last modified date and a trigger to maintain this • Change tracking • Change data capture!
CHANGE DATA CAPTURE• SQL Server Enterprise edition, 2008 or higher• Asynchronous process• Captures all changes• Maintains time window• CDC data access via table valued functions Books online, change data capture
2: CDC TASK AND COMPONENTS• CDC needs to keep track of which changes have already been processed• CDC task does this by storing LSNs in a tracking table• CDC Source component reads from the CDC table function, based on the LSN it got from the CDC task• CDC transformation splits records into new rows, updated rows and deleted rows• No documentation yet in RC0, check Matt Masson’s blog• Based on Attunity CDC components
3: MAPPING DATA FLOW COLUMNS• When modifying a data flow, column remapping is sometimes needed• SSIS 2012 maps columns on name instead of id• It also has an improved remapping dialog
4: ODBC SOURCE AND DESTINATION• ODBC was not natively supported in 2008• SSIS 2012 has ODBC Source & Destination • Handy for connecting to SQL Azure • Essential if SQL Server stops supporting OleDb• SSIS 2008 could access ODBC via ADO.Net: • Has create table option, which ODBC lacks • No control on batch inserts nr of rows ODBC ADO.Net % Diff • Low performance 1000 0,42 2,12 405% 10000 4,91 7,84 60% 100000 49,2 78,36 59% 1000000 481,65 781,28 62%
REPLACE OLEDB WITH ODBC?• After comparing ODBC with ODBC via ADO.Net, lets test ODBC versus OleDb • On bulk insert nr of rows OleDb OleDb Fast ODBC % Diff 1000 0,15 0,07 0,865 477% 10000 0,32 0,16 4,8 1400% 100000 1,66 0,565 48,13 2799% 1000000 12,485 9,12 483,085 3769% • On row by row nr of rows OleDb ODBC % Diff 1000 0,62 0,76 -18% 10000 9,15 6,28 46% 100000 71,21 67,37 6% 1000000 730,16 684,28 7% Your mileage may vary…
5: SCRIPTING• Script task and script component now support .Net 4.0• Breakpoints are supported in script component• When developing custom components, there is better backpressure support: • SupportsBackPressure property, IsInputReady and GetDependantInputs method
6: EXPRESSION TASK• The script task can be used to modify variable values… but it’s overkill• Expression task provides a simple task to change variable values
DATA QUALITY SERVICES (DQS)• DQS is a new service to clean domain data• Domain knowledge base needs to be build • Based on rules, positive and negative examples • Potentially using external data from Azure Marketplace or other providers
7: DQS CLEANSING TASK• Cleaning and standardizing data before it is loaded in the data warehouse is essential• DQS Cleansing task labels data in 4 categories: • Correct: a value accepted by the knowledge base • Corrected: a value on which DQS is confident it can correct to a valid domain value • Suggested: a value on which DQS is less confident, but can still suggest a domain value • New: DQS has no suggestions for this• See Koen Verbeeck’s session on DQS for more info!
8: PACKAGE CATALOG• SSIS 2012 can work in the new project mode (default) or in old package mode (backwards compatibility)• In project mode, many things change: • Project becomes the level of deployment • Deployment to SQL Server becomes obligatory • Packages not stored in msdb, but in dedicated user database: o The package catalog, named SSISDB • Logging happens automatically and is done in the package catalog o Custom logging still supported• Projects can be converted from one deployment type to another
PACKAGE CATALOG• Manage via SSMS: Relational engine• Fixed database name: SSISDB• Stores projects, versions, logs, 5 reports, 25 views, 42 stored procedures, …• This makes it possible to run, monitor and manage SSIS projects and packages via T-SQL!
9: PARAMETERS• Just two scopes: • Package • Project!• Read-only • Value is set when scope starts and cannot be changed • Can be set from SQL Server Data Tools configurations• Often used together with environments• Does not replace variables • It is more a package configuration replacement• Using the visual studio (SSDT) configurations we can configure default values for testing
10: SHARED CONNECTION MANAGERS• Shared connection manager is defined at project level and is automatically available in every package • Not copied as in SSIS 2008• Shared connection managers can be parameterized as well• When converting shared connection managers back to regular (package) connection managers, they disappear in all other packages• Shared cache connection managers are supported as well • This allows to cache data in memory in one package and reuse it in multiple other packages
11: ENVIRONMENTS• Environments replace package configurations• They can control parameter values and connection strings• Environments are created in the package catalog • They are not deployed to the server, but created on the server • Don’t forget to reference the environment at the project level • Script them while creating, this eases creating multiple environments• A server might have multiple environments • When we execute a package, we can choose which environment we’ll use
12: DATA TAPS• Imagine a data viewer • Which can be added on the runtime server • Without modifying the package, but using T-SQL • Which writes the data to disk instead on visualizing it…• Voila, you are now thinking about the data tap
13: AND A LOT MORE…• .Net API and Powershell• Pivot and row count transformation get a user interface• Flat file supports • Embedded qualifiers • Variable number of columns (but still fixed meta-data)• Raw file improvements • Generate empty raw file • Stores sort info• DTSX files are becoming more readable and ‘mergeable’ • Sorted, filtered and prettyprinted• Merge and merge join improve backpressure handling
AND A LOT MORE…• 4000 char expression length lifted• New expression language keywords • LEFT as syntactic sugar for SUBSTRING(,1,) • TOKEN and TOKENCOUNT for shredding strings
SUMMARY• Improved GUI• Change data capture support• Easy column remapping• ODBC connections• .Net 4.0 support & script component debugging• Expression Task• Data Quality Cleansing• Package catalog• Parameters• Shared Connection Managers• Environments• Data Taps• And a lot more…