Ssn0020 ssis 2012 for beginners


Published on

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Ssn0020 ssis 2012 for beginners

  1. 1. SQL Saturday NightSQL Server 2012Integration Servicesfor BeginnersJune 01, 201320th
  2. 2. SQL Server 2012Integration Servicesfor Beginners20th SQL Saturday NightMay 25, 2013
  3. 3. Antonios ChatzipavlisSolution Architect – SQL Server Evangelist• I have been started with computers since 1982.• In 1988 I started my professional carrier in computers industry.• In 1996 I have been started to work with SQL Server version 6.0• In 1998 I earned my first certification at Microsoft asMicrosoft Certified Solution Developer (3rd in Greece).• Since then I earned more than 30 certifications (and counting!)• MCT, MCITP, MCPD, MCSD, MCDBA, MCSA, MCTS, MCAD, MCP, OCA,• MCSE : Data Platform• In 1998 started my carrier as Microsoft Certified Trainer (MCT).• Since then I have more than 15.000 hours of training.• In 2010 I became for first time Microsoft MVP on SQL Server• In 2010 I created the SQL School Greece (• I am board member of IAMCT organization and Leader of GreekChapter of IAMCT organization.• In 2012 I became MCT Regional Lead by Microsoft Learning Program.• I am moderator of and member of dotnetzone.grSP_WHO
  4. 4. @antoniosch@sqlschool SQL School help@sqlschool.grCOMMUNICATION
  5. 5. AgendaSQL Server 2012 Integration Services for Beginners
  6. 6. • Introduction to SSIS• SSIS Tools• Variables, Parameter, Expressions• SSIS Tasks• Containers• Data FlowsAGENDA
  7. 7. Introduction to SSISSQL Server 2012 Integration Services for Beginners
  8. 8. WHAT IS SSIS?• A platform for ETL operations• Installed as a feature of SQL Server• Useful for DBAs, Developers, Data Analysts• DTS Evolution
  9. 9. SSIS ARCHITECTURE (1)• SSIS Project• A versioned container for parameters and packages• A unit of deployment to an SSIS Catalog• SSIS Package• A unit of task flow execution• A unit of deployment (package deployment model)
  10. 10. SSIS ARCHITECTURE (2)Control Flow• It is the brain of package• Orchestrates the order of execution for all itscomponents• Tasks• Individual units of work• Precedence constraints• Directs tasks to execute in a give order• Defines the workflow of SSIS package• Containers• Core units for grouping tasks together logically into units of work• Give us the abilityto define variables and events within container scope
  11. 11. SSIS ARCHITECTURE (3)Data Flow• It is the heart of package• Provides the capability to implements ETL• Sources• A component which specify the location of the source data• Transformation• Allow changes to the data within the data pipeline• Destinations• A component which specify the destination of the transformed data
  12. 12. SSIS ARCHITECTURE (4)• Variables• SSIS variables can be set to evaluate to an expression at runtime• Parameters• Parameters behave much like variables but with a few main exceptions.• Parameters can make a package dynamic.• Parameters can be set outside the package easily• Can be designated as values that must be passed in for the package tostart• Parameters are new to SSIS in SQL Server 2012 and replace the capabilitiesof Configurations in previous releases of SQL Server.• Error Handling and Logging
  13. 13. SSIS ToolsSQL Server 2012 Integration Services for Beginners
  14. 14. IMPORT – EXPORT WIZARDThe easiest method tomove dataUses SSIS as a frameworkOptionally we can save apackage
  15. 15. Import – ExportWizard
  17. 17. UsingSQL Server Data Toolsto createSSIS Project & Package
  18. 18. Variables,Parameter,ExpressionsSQL Server 2012 Integration Services for Beginners
  19. 19. SSIS DATA TYPES• Named differently than similar types in .NETor T-SQL• C:Program FilesMicrosoft SQL Server110DTSMappingFiles• The .NET managed types are important onlyif you are using:• Script component• CLR• .NET-based coding to manipulate your Data Flows.
  20. 20. SSIS DATA TYPE SQL SERVER DATA TYPE .NET DATA TYPEDT_NUMERIC numeric System.DecimalDT_DECIMAL DecimalDT_CY numeric, decimalDT_I1 System.SbyteDT_I2 smallint System.Int16DT_I4 int System.Int32DT_BOOL Bit System.BooleanDT_I8 bigint System.Int64DT_R4 real System.SingleDT_R8 float System.DoubleDT_U1 tinyint System.ByteDT_U2 System.UInt16DT_U4 System.UInt32DT_U8 System.UInt64DT_GUID Uniqueidentifier System.GuidSSIS NUMERIC DATA TYPES TABLE MAPPINGS
  21. 21. SSIS DATA TYPE SQL SERVER DATA TYPE .NET DATA TYPEDT_WSTR nvarchar, nchar System.StringDT_STR varchar, charDT_TEXT textDT_NTEXT ntext, sql_variant, xmlDT_BYTES binary, varbinary System.Byte()DT_IMAGE timestamp, imageDT_DBTIMESTAMP smalldatetime, datetime System.DateTimeDT_DBTIMESTAMP2 datetimeDT_DBDATE DateDT_DATEDT_FILETIMEDT_DBDATETIMESTAMPOFFSET datetimeoffsetDT_DBTIME2 time System.TimeSpanDT_DBTIMESSIS STRING AND DATE-TIME DATA TYPES TABLE MAPPINGS
  22. 22. PERFORMANCE AND DATA TYPES• Convert only when necessary.• Don’t convert columns from a data source that will be dropped from the datastream.• Each conversion costs something.• Convert to the closest type for your destinationsource using the mapping files.• If a value is converted to a non-supported data type, you’ll incur an additionalconversion internal to SSIS to the mapped data type.• Convert using the closest size and precision.• Don’t import all columns as 50-character data columns if you are working witha fixed or reliable file format with columns that don’t require as much space.• Evaluate the option to convert after the fact.• Remember that SSIS is still an ETL tool and sometimes it is more efficient tostage the data and convert it using set-based methods.
  23. 23. UNICODE AND NON-UNICODE• Unicode data type is the default• Default Import behavior• All the string SSIS functions expect Unicode strings as input• Use Data Conversion Transformation toconvert non-Unicode data types toappropriate Unicode data type according tothe mapping table
  24. 24. CONVERSION IN SSIS EXPRESSIONS• Use explicit casting to avoid troubles• Casting is easy• It looks like .NET• (DT_I4) 32• Casting operators parameters• Length – Final string length• Code_Page – Unicode character set• Precision• Scale
  25. 25. VARIABLES• Key feature in SSIS• User Variables• Variables created by an SSIS developer to hold dynamic values• Defined in the User namespace by default User::MyVar• Defined at a specified scope• Syntax: User::MyVar• System Variables• Built-in variables with dynamic system values• Defined in the System namespace• Syntax: System::MyVar
  26. 26. PARAMETERS• New feature in SQL Server 2012.• Similar to a variable• Can store information• It has a few different properties and uses• Parameters are set externally• Project parameters• Package parameters• Replace package configurations• when using the project deployment model• Required property of the parameter• Necessitatethe caller of the package to pass in a value for the parameter
  27. 27. PROJECT PARAMETERS• Created at the project level• Can be used in all packagesthat are included in thatproject.• Best used for values thatshould be shared amongpackages such as e-mailaddresses for errormessages.
  28. 28. PACKAGE PARAMETERS• Created at the packagelevel• Can be used only in thatpackage• Best used for values specificto that package, such asdirectory locations.
  29. 29. PARAMETERS VS VARIABLESParametersIf you want to set thevalue of somethingfrom outside thepackageVariablesIf you want to createor store values onlywithin a package
  30. 30. VARIABLE & PARAMETERS DATA TYPESVARIABLE DATA TYPE SSIS DATA TYPE REMARKSBoolean DT_BOOL Be careful setting these data types in code because the expression language and.NET languages define these differentlyByte DT_UI1 A 1-byte unsigned int. (Note this is not a byte array.)Char DT_UI2DateTime DT_DBTIMESTAMPDBNull N/ADouble DT_R8Int16 DT_I2Int32 DT_I4Int64 DT_I8Object N/A An object reference. Typically used to store data sets or large object structuresSbyte DT_I1Single DT_R4String DT_WSTRUInt32 DT_UI4UInt64 DT_UI8
  31. 31. EXPRESSIONS• Used to set values dynamically• Properties• Conditional split criteria• Derived column values• Precedence constraints• Based on Integration Services expression syntax• Can include variables and parameters• The language is heavily C#-like but contains a Visual Basic flavor and sometimesT-SQL for functions• Can be created graphically by using ExpressionBuilder• Expression adorners, a new feature in SQL Server2012
  32. 32. EXPRESSION OPERATORS• || Logical OR• && Logical AND• == Determine equivalency• != Determine inequality• ? : Conditional operator
  33. 33. EXPRESSION BUILDER• Friendly UI• Provides• Syntax check• Expression evaluation
  34. 34. EXPRESSION SYNTAX BASICS• Equivalence Operator (==)• String Concatenation (+)• Line Continuation (n)• Literals• Numeric Literals• 30L -> DT_I8• 30U -> DT_R8• 30.0F -> DT_R4• 30.3E -> DT_R8• String Literals• n CRLF• t TAB• ” Quotation mark• Backslash• Boolean Literals• @User::Var == True
  35. 35. DEALING WITH NULL IN SSIS• In SSIS variable can not be set to NULL• IsNull SSIS function <> ISNULL T-SQL• Each data type maintains a default valueVARIABLE DATA TYPE DEFAULT VALUE VARIABLE DATA TYPE DEFAULT VALUEBoolean False DBNull N/A in an expressionByte 0 Object N/A in an expressionChar 0 String “”DateTime 30/12/1899 Sbyte 0Double 0 Single 0Int16 0Int32 0 UInt32 0Int64 0 UInt64 0
  36. 36. STRING FUNCTIONS• SSIS String Functions are UNICODE• Is different from SQL Server string functions• The comparison is case and paddingsensitive• Comparing strings requires that you have two strings of the same paddinglength and case.
  37. 37. USING EXPRESSIONS IN SSIS PACKAGE• Variable and Parameters as Expressions• Expressions in Connection ManagerProperties• Expressions in Control Flow Tasks• Expressions in Control Flow Precedence• Expression Task new in SQL Server 2012• Expression in Data Flow
  38. 38. SSIS TasksSQL Server 2012 Integration Services for Beginners
  39. 39. WHAT IS A TASK• A foundation of the Control Flow in SSIS• A discrete unit of work that can performtypical actions• Each task has a set of setup parameters• Setup parameters are visible at Task Editor• Most of these parameters can be dynamic byusing expression on Task Editor ExpressionTab
  40. 40. CONTROL FLOW TASKSData Flow Tasks Database Tasks File & Internet Tasks• Data Flow • Data Profiling• Bulk Insert• Execute SQL• Execute T-SQL• CDC Control• File System• FTP• XML• Web Service• Send MailProcess Execution Task WMI Tasks Custom Logic Tasks• Execute Package• Execute Process• WMI Data Reader• WMI Event Watcher• Script• Custom TasksDatabase Transfer Tasks Analysis Services Tasks SQL Server Maintenance Tasks• Transfer Database• Transfer Error Messages• Transfer Jobs• Transfer Logins• Transfer Master StoredProcedures• Transfer SQL Server Objects• Analysis Services Execute DDL• Analysis Services Processing• Data Mining Query• Back Up Database• Check Database Integrity• History Cleanup• Maintenance Cleanup• Notify Operator• Rebuild Index• Reorganize Index• Shrink Database• Update Statistics
  41. 41. PRECEDENCE CONSTRAINTSConnect sequencesof tasksThree control flowconditionsSuccessFailureCompletionMultiple constraintsLogical ANDLogical ORTask 1Task 2Task 3 Task 4Task 5Task 10Task 6Task 7Success (AND)Failure (AND)Completion (AND)Success (OR)Failure (OR)Completion (OR)Task 9 Task 8
  42. 42. TASK EDITOR• Easy toaccess• Consistentdesign• Name andDescriptiongenericproperties
  43. 43. TASK EDITORSSIS uses a conceptof setting the valueof most taskproperties to adynamic expressionthat is evaluated atruntime
  45. 45. COMMON PROPERTIES (1)• If set to true, SSIS will not validate any of theproperties set in the task until runtime.• This is useful if you are operating in adisconnected mode• Want to enter a value for production that cannot be validateduntil the package is deployed, or you are dynamically settingthe properties using expressions• The default value for this property is false.DelayValidation
  46. 46. COMMON PROPERTIES (2)• The description of what the instance of the task does.• The default name for this is <task name>• if you have multiple tasks of the same type, it would read <task name 1> (where the number 1increments).• This property does not have to be unique and is optional.• If you do provide details here, it will display in the tooltipwhen hovering over the task object.• For consistency, the property should accurately describewhat the task does for people who may be monitoringthe package in your operations group.Description
  47. 47. COMMON PROPERTIES (3)• If set to true, the task is disabled and will notexecute.• Helpful if you are testing a package and wantto disable the execution of a task temporarily.• Is the equivalent of commenting a task outtemporarily.• Is set to false by default.Disable
  48. 48. COMMON PROPERTIES (4)• Contains the name of the custom variablethat will store the output of the task’sexecution.• The default value of this property is <none>means the execution output is not stored• Enables the task to expose informationrelated to the results of the internal actionswithin the task.ExecValueVariable
  49. 49. COMMON PROPERTIES (5)• If set to true• The entire package will fail if the individual task fails.• Typically, you want to control what happens to a package if a task fails witha custom error handler or Control Flow.• By default, this property is set to false.FailPackageOnFailure
  50. 50. COMMON PROPERTIES (6)• If set to true• The task’s parent will fail if the individual task reports an error.• The task’s parent can be a package or container• By default, this property is set to false.FailParentOnFailure
  51. 51. COMMON PROPERTIES (7)• Read-only• Automatically generated unique ID• Associated with an instance of a task.• The ID is in GUID format• {BK4FH3I-RDN3-I8RF-KU3F-JF83AFJRLS}.ID
  52. 52. COMMON PROPERTIES (8)• Specifies the isolation level of the transaction• The values are• Chaos• ReadCommitted• ReadUncommitted• RepeatableRead• Serializable• Unspecified• Snapshot.• The default value of this property is SerializableIsolationLevel
  53. 53. COMMON PROPERTIES (9)• Specifies the type of logging that will beperformed for this task.• The values are• UseParentSetting• Enabled• Disabled.• The default value is UseParentSetting• Basic logging is turned on at the package level bydefault in SQL Server 2012.LoggingMode
  54. 54. COMMON PROPERTIES (10)• The name associated with the task.• The default name for this is <task name>• Ιf you have multiple tasks of the same type, <task name 1> (where the number1 increments)• Change this name to make it more readable to anoperator at runtime• Ιt must be unique inside your package.• Used to identify the task programmaticallyName
  55. 55. COMMON PROPERTIES (11)• Specifies the transaction attribute for the task.• Possible values are• NotSupported• Supported• Required.• The default value is Supported• Supported enables the option for you to use transactions in your task.TransactionOption
  56. 56. DATA FLOW TASK• The heart of SSIS• Has its own design surface• Encapsulates all the data transformationaspects of ETL• Each Data Flow tasks corresponding toseparate Data Flow surface• Split and handle data in pipelines based ondata element
  57. 57. Database Tasks• Data Profiling• Bulk Insert• Execute SQL• Execute T-SQL• CDC Control
  58. 58. DATA PROFILER TASK• Examining data and collectingmetadata about the quality of thedata• About frequency of statistical patterns,interdependencies, uniqueness, and redundancy.• Important for the overall quality and health of anoperational data store (ODS) or data warehouse.• Doesn’t have built-in conditionalworkflow logic• but technically you can use XPath queries on theresults• Creates a report on data statistics• You still need to make judgments about thesestatistics.• For example, a column may contain an overwhelmingamount of NULL values, but the profiler doesn’t knowwhether this reflects a valid business scenario.EXAMINING AREASo Candidate Key ProfileRequesto ColumnLength DistributionProfileo ColumnNull Ratio ProfileRequesto ColumnPattern ProfileRequesto ColumnStatistics ProfileRequesto FunctionalDependencyProfile Requesto ValueInclusionProfileRequest
  59. 59. Data Profiler Task
  60. 60. BULK INSERT TASK• Inserts data from a text or flat file into a SQLServer database• Similar to BULK INSERT statement or BCP.exe• Very fast operation especially with large amount of data• In fact this is a wizard to store the information needed to create andexecute a bulk copying command at runtime• Has no ability to transform data• Because of this give us the fastest way to load data
  61. 61. EXECUTE SQL TASK• Is one of the most widely used task in SSIS• Used for• Truncating a staging data table prior to importing• Retrieving row counts to determine the next step in a workflow• Calling stored proc to perform business logic against sets of stage data• Retrieve information from a database repository• Executing a Parameterized SQL Statement• ? Indicates the parameter when we are using ADO, ODBC, OLEDB andEXCEL• ADO, OLEDB and EXCEL is zero base• ODBC starts from 1• @<Real Param Name> when we are using ADO.NET
  62. 62. Bulk Insert TaskExecute SQL Task
  63. 63. EXECUTE T-SQL STATEMENT TASK• Similar to Execute SQL Task• Supports only T-SQL for SQL Server
  64. 64. CDC CONTROL TASK• Used to control the life cycle of change datacapture (CDC)• Handles CDC package synchronization• Maintains the state of the CDC package• Supports two groups of operations.• One group handles the synchronization of initial load and changeprocessing• The other manages the change-processing range of LSNs for a run of aCDC package and keeps track of what was processed successfully.
  65. 65. File & Internet Tasks• File System• FTP• XML• Web Service• Send Mail
  66. 66. FILE SYSTEM TASK• Performs file operations available in theSystem.IO.File .NET class.• The creation of directory structures does nothave to be made recursively as we did in theDTS legacy product.• It is written for a single operation• If you need to iterate over a series of fi les or directories, the File SystemTask can be simply placed within a Looping Container
  67. 67. File System Task
  68. 68. FTP TASK• Enables the use of the File Transfer Protocol(FTP) in your package development tasks.• Exposes more FTP command capability• Enabling you to create or remove local and remote directories and files.• Another change from the legacy DTS FTP Task is the capability to use FTP inpassive mode.• This solves the problem that DTS had in communicating with FTP servers when thefirewalls filtered the incoming data port connection to the server.
  69. 69. XML TASKValidate,modify, extract,create files inan XML format.
  70. 70. XML Task
  71. 71. WEB SERVICE TASK• Used to retrieve XML-based result sets byexecuting a method on a web service• Only retrieves the data• It doesn’t yet address the need to navigate through the data, or extractsections of the resulting documents.• Can be used in SSIS to provide real-time validation of data in your ETLprocesses or to maintain lookup or dimensional data.• Requires creation of an HTTP ConnectionManager• To a specific HTTP endpoint on a website or to a specific Web ServicesDescription Language (WSDL) file on a website
  72. 72. SEND MAIL TASK• Sending e-mailmessages via SMTP• Only supportsWindows andAnonymousauthentication.• Google mail needs basicauthentication.• So you cannot configureSMTP Connection Managerin SSIS with an external SMTPserver like Gmail, Yahoo etc
  73. 73. Send Mail Task
  74. 74. Process Execution Task• Execute Package• Execute Process
  75. 75. EXECUTE PACKAGE TASK• Enables you to build SSIS solutions calledparent packages that execute other packagescalled child packages• Several improvements have simplified thetask:• The child packages can be run as either in-process or out-of-processexecutables.• A big difference in this release of the task compared to its 2008 or 2005predecessor is that you execute packages within a project to make migrating thecode from development to QA much easier.• The task enables you to easily map parameters in the parent package tothe child packages now too.
  76. 76. EXECUTE PROCESS TASK• Executes a Windows or console applicationinside of the Control Flow.• The most common example would have to be unzipping packed orencrypted data files with a command-line tool• The configuration items for this task are:• RequireFullFileName• Executable• WorkingDirectory• StandardInputVariable• StandardOutputVariable• StandardErrorVariable• FailTaskIfReturnCodeIsNotSuccessValue• Timeout• TerminateProcessAfterTimeOut• WindowStyle
  77. 77. Process Execution Task• Execute Package• Execute Process
  78. 78. WMI Tasks• WMI Data Reader• WMI Event Watcher
  79. 79. WMI DATA READER TASK• Οne of the best-kept secrets in Windows• Enables you to manage Windows servers and workstations through a scriptinginterface similar to running a T-SQL query• This task enables you to interface with thisenvironment by writing WQL queries• The output of this query can be written to a file or variable for laterconsumption• You could use the WMI Data Reader Task to:• Read the event log looking for a given error.• Query the list of applications that are running.• Query to see how much RAM is available at package execution for debugging.• Determine the amount of free space on a hard drive.
  80. 80. WMI EVENT WATCHER TASK• The WMI Event Watcher Task empowers SSISto wait for and respond to certain WMIevents that occur in the operating system.• The following are some of the useful thingsyou can do with this task:• Watch a directory for a certain fi le to be written.• Wait for a given service to start.• Wait for the memory of a server to reach a certain level before executingthe rest of the package or before transferring fi les to the server.• Watch for the CPU to be free.
  81. 81. WMI Data Reader
  82. 82. Custom Logic Tasks• Script• Custom Tasks
  83. 83. SCRIPT TASK• Enables you to access the VSTA environmentto write and execute scripts using the VB andC# languages• In the latest SSIS edition• Solidifies the connection to the full .NET 4.0 libraries for both VB and C#.• A coding environment with the advantage of IntelliSense• An integrated Visual Studio design environment within SSIS• An easy-to-use methodology for passing parameters into the script• The capability to add breakpoints to your code for testing and debuggingpurposes• The automatic compiling of your script into binary format for increasedspeed
  84. 84. Script Task
  85. 85. CUSTOM TASK• In a real-world integration solution, you may haverequirements that the built-in functionality in SSISdoes not meet• Use Visual Studio and Class Library projecttemplate• Reference the following assemblies• Microsoft.SqlServer.DTSPipelineWrap• Microsoft.SqlServer.DTSRuntimeWrap• Microsoft.SqlServer.ManagedDTS• Microsoft.SqlServer.PipelineHost• In addition the component needs• Provide a strong name key for signing the assembly.• Set the build output location to the PipelineComponentsfolder.• Use a post-build event to install the assembly into the GAC.• Set assembly-level attributes in the AssemblyInfo.csfile.
  86. 86. Database TransferTasks• Transfer Database• Transfer Error Messages• Transfer Jobs• Transfer Logins• Transfer Master Stored Procedures• Transfer SQL Server Objects
  93. 93. Analysis Services Tasks• Analysis Services Execute DDL• Analysis Services Processing• Data Mining Query
  97. 97. SQL ServerMaintenance Tasks• Back Up Database• Check Database Integrity• History Cleanup• Maintenance Cleanup• Notify Operator• Rebuild Index• Reorganize Index• Shrink Database• Update Statistics
  107. 107. ContainersSQL Server 2012 Integration Services for Beginners
  108. 108. SEQUENCE CONTAINER• Handle the flow of a subset of a package• Can help you divide a package into smaller andmore manageable pieces• Usage of Sequence Container include:• Grouping tasks so that you can disable a part of the package that’s no longerneeded• Narrowing the scope of the variable to a container• Managing the properties of multiple tasks in one step by setting the propertiesof the container• Using one method to ensure that multiple tasks have to execute successfullybefore the next task executes• Creating a transaction across a series of data-related tasks, but not on the entirepackage• Creating event handlers on a single container, wherein you could send an emailif anything inside one container fails and perhaps page if anything else fails
  109. 109. GROUPS• Are not actually containers but simply a way togroup components together• A key difference between groups and containersis that properties cannot be delegated through acontainer• Don’t have precedence constraints originating from them (only from the tasks).• You cannot disable the entire group, as you can with a Sequence Container.• Groups are good for quick compartmentalization of tasks for aesthetics.• Their only usefulness is to quickly groupcomponents in either a Control Flow or a DataFlow together.
  110. 110. FOR LOOP CONTAINER• Enables you to create looping in yourpackage similar to how you would loop innearly any programming language.• In this looping style, SSIS optionally initializesan expression and continues to evaluate ituntil the expression evaluates to false.
  111. 111. FOREACH LOOP CONTAINER• Enables you to loop through a collection of objects.• Foreach File Enumerator: Performs an action for each fi le in a directorywith a given fileextension• Foreach Item Enumerator: Loops through a list of items that are set manually in thecontainer• Foreach ADO Enumerator: Loops through a list of tables or rows in a table from an ADOrecordset• Foreach ADO.NET Schema Rowset Enumerator: Loops through an ADO.NETschema• Foreach From Variable Enumerator: Loops through an SSIS variable• Foreach Nodelist Enumerator: Loops through a node list in an XML document• Foreach SMO Enumerator: Enumerates a list of SQL Management Objects (SMO)• As you loop through the collection• the containerassigns the value from the collection to a variable• which can later be used by tasks or connections insideor outsidethe container• also you can also map the value to a variable
  112. 112. Containers
  113. 113. Data FlowsSQL Server 2012 Integration Services for Beginners
  114. 114. DATA FLOW TASK• The heart of SSIS• Has its own design surface• Encapsulates all the data transformationaspects of ETL• Each Data Flow tasks corresponding toseparate Data Flow surface• Split and handle data in pipelines based ondata element
  115. 115. DATA SOURCES• Databases• ADO.NET• OLE DB• CDC Source• Files• Excel• Flat files• XML• Raw files• Others• Custom
  116. 116. DATA DESTINATIONS• Database• ADO.NET• OLE DB• SQL Server• ORACLE• File• SSAS• Rowset• Other
  117. 117. CONNECTION MANAGERS• A connection to a data source or destination:• Provider (for example, ADO.NET, OLE DB, or flat file)• Connection string• Credentials• Project or package level:• Project-level connection managers:• Can be shared across packages• Are listed in Solution Explorer and the Connection Managers pane for packages inwhich they are used• Package-level connection managers:• Can be shared across objects in the package• Are listed only in the Connection Managers pane for packages in which they areused
  118. 118. DATA FLOW TRANSFORMATIONSRow Transformations Rowset Transformations Split & Join Transformations• Character Map• Copy Column• Data Conversion• Derived Column• Export Column• Import Column• OLE DB Command• Aggregate• Sort• Percentage Sampling• Row Sampling• Pivot• Unpivot• Conditional Split• Multicast• Union All• Merge• Merge Join• Lookup• Cache• CDC SplitterAudit Transformations BI Transformations Custom Transformations• Audit• RowCount• Slowly Changing Dimension• Fuzzy Grouping• Fuzzy Lookup• Term Extraction• Term Lookup• Data Mining Query• Data Cleansing• Script Component• Custom Component
  119. 119. SYNCHRONOUS VS ASYNCHRONOUS TRANSFORMATIONS• Synchronous• Synchronous transformations such as the Derived Column and Data Conversion, whererows flow into memory buffers in the transformation, and the same buffers come out.• No rows are held, and typically these transformations perform very quickly,with minimalimpact to your Data Flow.• Asynchronous• Asynchronous transformations can cause a block in your Data Flowand slow down yourruntime.• There are two types of asynchronous transformations:• Partially blocking transformations• Create new memory buffers for the output of the transformation.• e.g. Union All transformation• Fully blocking transformations• e.g. Sort and Aggregate Transformations• Create new memory buffers for the output of the transformation but cause a full block of thedata.• These fully blocking transformations represent the single largest slowdown in SSIS and should beconsidered carefully in terms of any architecture decisions you must make.
  120. 120. Row Transformations• Character Map• Copy Column• Data Conversion• Derived Column• Export Column• Import Column• OLE DB Command
  121. 121. CHARACTER MAP• Performs common character translations• Modified column to be added• as a new column or• to update the original column• The available operation types are:• Byte Reversal: Reverses the order of the bytes.• For example, for the data 0x1234 0x9876, the result is 0x4321 0x6789• Full Width: Converts the half-width charactertype to full width.• Half Width: Converts the full-width character type to half width.• Hiragana: Converts the Katakanastyle of Japanese characters to Hiragana.• Katakana: Converts the Hiraganastyle of Japanese characters to Katakana.• Linguistic Casing: Applies the regional linguistic rules for casing.• Lowercase: Changes all letters in the input to lowercase.• Traditional Chinese: Converts the simplified Chinese characters to traditional Chinese.• Simplified Chinese: Converts the traditional Chinesecharacters to simplified Chinese.• Uppercase: Changes all letters in the input to uppercase.
  122. 122. COPY COLUMN• A very simple transformation that copies theoutput of a column to a clone of itself.• This is useful if you wish to create a copy of acolumn before you perform some elaboratetransformations.• You could then keep the original value as your control subject and the copyas the modified column.
  123. 123. DATA CONVERSION• Performs a similar function to the CONVERTor CAST functions in T-SQL.• The Output Alias is the column name youwant to assign to the column after it istransformed.• If you don’t assign it a new name, it will later be displayed as DataConversion: ColumnName in the Data Flow.
  124. 124. DERIVED COLUMN• Creates a new column that is calculated(derived) from the output of another columnor set of columns.• One of the most important transformationsin Data Flow.• Examples• To multiply the quantity of orders by the cost of the order to derive thetotal cost of the order• You can also use it to find out the current date or to fill in the blanks in thedata by using the ISNULL function.
  125. 125. EXPORT COLUMN• Exports data to a file from the Data Flow• Unlike the other transformations, the Export Column Transformationdoesn’t need a destination to create the file• A common example is to extract blob-typedata from fields in a database and create filesin their original formats to be stored in a filesystem or viewed by a format viewer, such asMicrosoft Word or Microsoft Paint.
  126. 126. IMPORT COLUMN• The Import Column Transformation is apartner to the Export Column transformation.• These transformations do the work oftranslating physical files from system filestorage paths into database blob-type fields,and vice versa
  127. 127. OLE DB COMMAND• Designed to execute a SQL statement foreach row in an input stream.• This task is analogous to an ADO Command object being created,prepared, and executed for each row of a result set.• This transformation should be avoidedwhenever possible.• It’s a better practice to land the data into a staging table using an OLE DBDestination and perform an update with a set-based process in the ControlFlow with an Execute SQL Task.
  128. 128. RowsetTransformations• Aggregate• Sort• Percentage Sampling• Row Sampling• Pivot• Unpivot
  129. 129. AGGREGATE• Aggregate data from the Data Flow to applycertain T-SQL functions that are done in a GROUPBY statement• The most important option is Operation.• Group By: Breaks the data set into groups by the column you specify• Average: Averages the selected column’s numeric data• Count: Counts the records in a group• Count Distinct: Counts the distinct non-NULL values in a group• Minimum: Returns the minimum numeric value in the group• Maximum: Returns the maximum numeric value in the group• Sum: Returns sum of the selected column’s numeric data in the group• It is a Fully Blocking transformation
  130. 130. SORT• Ιt is a fully blocking asynchronous transformation• Enables you to sort data based on any column inthe path.• When possible avoid using it, because of speed• However, some transformations, like the Merge Join and Merge, require thedata to be sorted.• If you place an ORDER BY statement in the OLE DB Source, SSIS is not aware ofthe ORDER BY statement.• If you have an ORDER BY clause in your T-SQL, you can notify SSIS that the datais already sorted, obviating the need for the Sort Transformation in theAdvanced Editor.
  131. 131. PERCENTAGE & ROW SAMPLING• Enable you to take the data from the source and randomly select asubset of data.• The transformation produces two outputs that you can select.• One output is the data that was randomly selected,• and the other is the data that was not selected.• You can use this to send a subset of data to a development or test server.• The most useful application of this transformation is to train a data-mining model.• You can use one output path to train your data-mining model,• and the sampling to validate the model.• The Percentage Sampling enables you to select the percentage ofrows• The Row Sampling Transformation enables you to specify howmany rows you wish to be outputted randomly.• You can specify the seed that will randomize the data.• If you select a seed and run the transformation multiple times, the same data will be outputted to thedestination.• If you uncheck this option, which is the default, the seed will be automatically incremented by one at runtime,and you will see random data each time.
  132. 132. PIVOT & UNPIVOT• A pivot table is a result of cross-tabulatedcolumns generated by summarizing datafrom a row format.• Unpivot is the opposite of Pivot
  133. 133. Split & JoinTransformations• Conditional Split• Multicast• Union All• Merge• Merge Join• Lookup• Cache• CDC Splitter
  134. 134. CONDITIONAL SPLIT• Add complex logic to your Data Flow.• This transformation enables you to send the data from a single data path tovarious outputs or paths based on conditions that use the SSIS expressionlanguage.• Is similar to a CASE decision structure in aprogramming language• Also provides a default output• If a row matches no expression it is directed to the default output.
  135. 135. MULTICAST• Send a single data input to multiple outputpaths• Is similar to the Conditional Split• Both transformations send data to multiple outputs.• The Multicast will send all the rows down every output path, whereas theConditional Split will conditionally send each row down exactly one outputpath.
  136. 136. MERGE• Merge data from two paths into a single output• Is useful when you• wish to break out your Data Flow into a path that handles certain errors andthen merge it back into the main Data Flow downstream after the errors havebeen handled• wish to merge data from two Data Sources.• But has some restrictions:• The data must be sorted before.• You can do this by using the Sort• Transformation prior to the merge or by specifying an ORDER BY clause in the sourceconnection.• The metadata must be the same between both paths.• For example,the CustomerIDcolumn can’t be a numeric column in one path and acharactercolumn in another path.• If you have more than two paths, you should choose the Union AllTransformation.
  137. 137. UNION ALL• Similar to Merge Transformation• You can merge data from two or more paths intoa single output• Does not require sorted data.• This transformation fixes minor metadata issues.• For example, if you have one input that is a 20-character string and another thatis 50 characters, the output of this from the Union All Transformation will be thelonger 50-character column.• You need to open the Union All Transformation Editor only if the column namesfrom one of the transformations that feed the Union All Transformation havedifferent column names.
  138. 138. MERGE JOIN• Merge the output of two inputs and performan INNER or OUTER join on the data• If both inputs are in the same database, thenit would be faster to perform a join at theOLE DB Source level, rather than use atransformation through T-SQL.• Useful when you have two different DataSources you wish to merge
  139. 139. LOOKUP• Performs lookups by joining data in inputcolumns with columns in a reference dataset• You use the lookup to access additionalinformation in a related table that is based onvalues in common columns.• Lookup Caching mechanism• Full-Cache Mode: stores all the rows resulting from a specified query inmemory• No-Cache Mode: you can choose to cache nothing, for each input rowcomponent sends a request to the reference table in the database server to askfor a match• Partial-Cache Mode: this mode caches only the most recently used data withinthe memory, as soon as the cache grows too big, the least-used cache data isthrown away.
  140. 140. CACHE• Generates a reference dataset for the LookupTransformation• by writing data from a connected data source in the data flow to a Cacheconnection manager.• You can use the Cache connection manager• When you want to configure the Lookup Transformation to run in the fullcache mode.• In this mode, the reference dataset is loaded into cache before the LookupTransformation runs.
  141. 141. CDC SPLITTER• Splits a single flow of change rows from aCDC source data flow into different dataflows for Insert, Update and Deleteoperations.• The data flow is split based on the required column __$operation and itsstandard values in SQL Server 2012 change tables.
  142. 142. Audit Transformations• Audit• RowCount
  143. 143. AUDIT• Allows you to add auditing data to your DataFlow• Because of acts such as HIPPA and Sarbanes-Oxley (SOX) governing audits, youoften must be able to track who inserted data into a table and when.• The task is easy to configure• Simply select the type of data you want to audit in the Audit Type column andthen name the column that will be outputted to the flow.• Following are some of the available options:• Execution instance GUID: GUID that identifies the execution instanceof the package• Package ID: Unique ID for the package• Package name: Name of the package• Version ID: Version GUID of the package• Execution start time: Time the package began• Machine name: Machine on which the package ran• User name: User who started the package• Task name: Data Flow Task name that holds the Audit Task• Task ID: Unique identifierfor the Data Flow Task that holds the Audit Task
  144. 144. ROWCOUNT• Provides the capability to count rows in astream that is directed to its input source.• This transformation must place that countinto a variable that could be used in theControl Flow
  145. 145. BI Transformations• Slowly Changing Dimension• Fuzzy Grouping• Fuzzy Lookup• Term Extraction• Term Lookup• Data Mining Query• Data Cleansing
  146. 146. SLOWLY CHANGING DIMENSION• Provides a great head start in helping to solve a common, classicchanging-dimension problem that occurs in the outer edge of yourdata model, the dimension or lookup tables.• A dimension table contains a set of discrete values with adescription and often other measurable attributes such as price,weight, or sales territory.• The classic problem is what to do in your dimension data when anattribute in a row changes particularly when you are loading dataautomatically through an ETL process• This transformation can shave days off of your development time inrelation to creating the load manually through T-SQL, but it canadd time because of how it queries your destination and how itupdates with the OLE DB Command Transform (row by row)
  147. 147. FUZZY LOOKUP• Performs data cleaning tasks• standardizing data, correcting data, and providing missing values• Uses an equi-join to locate matching records inthe reference table• It returns records with at least one matching record,• and returns records with no matching records.• Uses fuzzy matching to return one or more close matches in the reference table.• Usually follows a Lookup transformation in apackage data flow• First, the Lookup transformation tries to find an exact match.• If it fails, the Fuzzy Lookup transformation provides close matches from thereference table.
  148. 148. FUZZY GROUPING• Performs data cleaning tasks• by identifying rows of data that are likely to be duplicates and• selecting a canonical row of data to use in standardizing the data.• Requires a connection to an instance of SQL Server• to create the temporary SQL Server tables that the transformation algorithm requires todo its work• The connection must resolve to a user who has permission to create tables in the database.• Produces one output row for each input row• Each row has the following additional columns:• _key_in: a column that uniquely identifies each row.• _key_out:,a column that identifies a group of duplicaterows.• The _key_out column has the value of the _key_in column in the canonical data row.• Rows with the same value in _key_out are part of the same group• The _key_out value for a group corresponds to the value of _key_in in the canonical data row.• _score: a value between 0 and 1 that indicates the similarity of the input row to thecanonical row.
  149. 149. TERM EXTRACTION• A tool to mine free-flowing text for Englishword and phrase frequency• If you have ever done some word and phrase analysis on websites forbetter search engine placement, you are familiar with the job that thistransformation performs• Based on Term Frequency and InverseDocument Frequency formula• TDIDF = (frequency of term) * log((# rows in sample) / (#rows with term orphrase))• Output two columns:• a text phrase• a statistical value for the phrase relative to the total input stream
  150. 150. TERM LOOKUP• Uses the same algorithms and statistical modelsas the Term Extraction Transformation to breakup an incoming stream into noun or noun phrasetokens• It is designed to compare those tokens to a storedword list and output a matching list of terms andphrases with simple frequency counts.• Generating statistics on known phrases ofimportance.• A real world application of this would be to pull out all the customer servicenotes that had a given set of terms or that mention a competitor’s name.
  151. 151. DATA MINING QUERY• Typically is used to fill in gaps in your data orpredict a new column for your Data Flow• Optionally you can add columns• such as the probability of a certain condition being true.• Usage Examples• You could take columns, such as number of children, household income,and marital income, to predict a new column that states whether theperson owns a house or not.• You could predict what customers would want to buy based on theirshopping cart items.• You could fill the gaps in your data where customers didn’t enter all thefields in a questionnaire.
  152. 152. DATA CLEANSING• Performs advanced data cleansing on data• A business analyst create a series of businessrules that declare what good data looks like• Create domains that define data in yourcompany• such as what a Company Name column should always look like.
  153. 153. Data Flow
  154. 154. CustomTransformations• Script Component• Custom Component
  155. 155. SCRIPT COMPONENT• Enables you to write custom .NET scripts as• Transformations• Sources• Destinations• Some of the things you can do with thistransformation• Create a custom transformation that would use a .NET assembly to validatecredit card numbers or mailing addresses.• Validate data and skip records that don’t seem reasonable.• Read from a proprietary system for which no standard provider exists.• Write a custom component to integrate with a third-party vendor.• Scripts used as sources can support multipleoutputs• You have the option of precompiling the scripts for runtime efficiency.
  156. 156. CUSTOM COMPONENT• In a real-world integration solution, you may haverequirements that the built-in functionality in SSISdoes not meet• Use Visual Studio and Class Library projecttemplate• Reference the following assemblies• Microsoft.SqlServer.DTSPipelineWrap• Microsoft.SqlServer.DTSRuntimeWrap• Microsoft.SqlServer.ManagedDTS• Microsoft.SqlServer.PipelineHost• In addition the component needs• Provide a strong name key for signing the assembly.• Set the build output location to the PipelineComponentsfolder.• Use a post-build event to install the assembly into the GAC.• Set assembly-level attributes in the AssemblyInfo.csfile.
  157. 157. Script Component
  158. 158. OPTIMIZING DATA FLOW PERFORMANCE• Optimize queries:• Select only the rows and columns that you need• Avoid unnecessary sorting:• Use presorted data where possible• Set the IsSorted property where applicable• Configure Data Flow task properties:• Buffer size• Temporary storage location• Parallelism• Optimized mode
  159. 159. Q&AQuestions And Answers
  160. 160. • Introduction to SSIS• SSIS Tools• Variables, Parameter, Expressions• SSIS Tasks• Containers• Data FlowsSUMMARY
  161. 161. Thank you