Your SlideShare is downloading. ×
SQL Server 2008 Integration Services
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

SQL Server 2008 Integration Services

3,259
views

Published on

Presentación sobre Integration Services en SQL Server 2008. …

Presentación sobre Integration Services en SQL Server 2008.

Ing. Eduardo Castro Martinez, PhD
Microsoft SQL Server MVP
http://ecastrom.blogspot.com
http://comunidadwindows.org

Published in: Technology

0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,259
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
9
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Slide Title: Title Slide Keywords: Key Message: Title Slide Slide Builds: 0 Slide Script: Hello and welcome to this Microsoft TechNet session on Advanced SQL Server 2005 Integration Services. My name is {insert name}. Slide Transition: Let’s start this session by going into more detail on exactly what we will be covering. Slide Comment: Additional Information:
  • Slide Title: What We Will Cover Keywords: Key Message: What we will cover Slide Builds: 2 Slide Script: We’re going to cover three advanced techniques. The first thing we’ll cover is using Web Services with SQL Server Integration Services. We’ve had a number of questions about how to use Integration Services within the Web Services environment. Organizations now are exposing more of their data and business processes through Web Services, so it’s natural to want to use Web Services with the data integration processes. [BUILD1] We’ll also talk about text mining techniques for Integration Services. Much of the data found in businesses is unstructured, as we’ll discuss during this session. Businesses need the ability to pull key words and phrases from this unstructured, free-text data and build warehouses, reference tables, and other useful data structures from it. We’ll talk about how to use text mining with Integration Services to achieve some of those goals. [BUILD2] Finally, we’ll talk about how to use data mining within Integration Services. The ability to use data mining within Integration Services is one of the most compelling new features of Integration Services, and we’ll discuss some of the business cases for using it. Slide Transition: As with most TechNet sessions, some prior experience of Microsoft technologies or similar technologies is always helpful. Here’s a brief overview of what would be helpful, but not essential, for this session. Slide Comment: Additional Information:
  • Slide Title: Helpful Experience Keywords: Key Message: Helpful Experience Slide Builds: 1 Slide Script: As we go through today's session, you will hear various Microsoft acronyms and terminology. While we will explain all new terms related to today's session, there are some general terms from the industry or from other versions of Microsoft products that we may not spend time on. To help you out, we have listed the areas that it may be helpful to be familiar with, either prior to this session or to reference afterwards. A basic knowledge of how to build data flows within SQL Server Integration Services is required. [BUILD1] Familiarity with scripting using languages such as VBScript is helpful, but not absolutely essential. Slide Transition: To cover the topics mentioned and keep the session flow going, we have divided the session up into the following agenda items. Slide Comment: Additional Information:
  • Slide Title: Agenda: Using Web Services with SSIS Keywords: Key Message: This agenda item discusses how to use Web Services with Integration Services. Slide Builds: 0 Slide Script: First, we’ll look at how to use the Web Services with Integration Services, including using the Web Service as part of the Script component to retrieve and process data through the Web Service. [BUILD1] After that, we’ll take a look at how to do text mining with Integration Services, and discuss why it’s an important feature of Integration Services. [BUILD2] Finally, we’ll look at how to use data mining tools directly within an Integration Services package and what the ramifications are for being able to use this interesting new feature of Integration Services. Slide Transition: Let’s talk about using Web services with SQL Server Integration Services. Slide Comment: Additional Information:
  • Can use like DTS if you want to…
  • Compare OLD ETL approach to SSIS approach Especially mention: Flexible sources Flexible Transformations Flexible Destinations, especially OLEDB
  • Demo 1 – Going to use a small piece of Project Real Data. For those of you who have not heard of Project Real. It is a sample BI implementation in SQL 2005 based on Barnes and Noble. All the schemas, ETL’s, Reports and data are published along with a set of white papers and best practise guidelines for large scale projects. We are just looking at loading Vendors (Suppliers in non US speak!). Scenario 1: We have 250,000 active vendors and we wish to load them from our source database into our data warehouse. Accounts have provided a list of blacklisted suppliers and we need to clash this to add an attribute to supplier to indicate if he is black listed. Show Query Plan. Why Use Lookup ? Show execution and while executing talk about pipeline, buffering and caching. More to come on why to sue Data Flows later Scenario 2: Our beloved accounts department can only supply the blacklist on an excel and we need to import them. Scenario 3 (Workflow): Accounts now want you to filter out Active Vendors who are on the blacklist and insert them into a table in the DWH for later investigation. After demo show how to do some of in excel (Demo1_SQL). Discuss limits of SQL (one table scan per task). Scenario 4: Accounts (who have no concept of SOA or databases) have said they can only supply a spreadsheet in workbook form, each page is filled in by a diff dept, and departments come and go. They want you to record the reason/dept for the blacklist (the worksheet name) and to email the head of finance a spreadsheet with any active suppliers that are flagged as blacklisted, and the dept that flagged them. Discuss: Sequence Containers, Variables, Scripting, For Loop (data flow1). Multi cast, Conditional split, Unicode issues (data flow 2), send email task (control flow). Use of Control Flow for rest of tables.
  • Demo 1 – use of Scripting to Infer a Dimension Explain Early Arriving Facts e.g. Sales arriving before customer.
  • Row Transformations - Row transformations either manipulate data or create new fields using the data that is available in that row. Examples of SSIS components that perform row transformations include Derived Column, Data Conversion, Multicast, and Lookup. While these components might create new columns, row transformations do not create any additional records. Because each output row has a 1:1 relationship with an input row, row transformations are also known as synchronous transformations . Row transformations have the advantage of reusing existing buffers and do not require data to be copied to a new buffer to complete the transformation. Partially blocking transformations - Partially blocking transformations are often used to combine datasets. They tend to have multiple data inputs. As a result, their output may have the same, greater, or fewer records than the total number of input records. Since the number of input records will likely not match the number of output records, these transformations are also called asynchronous transformations . Examples of partially blocking transformation components available in SSIS include Merge, Merge Join, and Union All. With partially blocking transformations, the output of the transformation is copied into a new buffer and a new thread may be introduced into the data flow. Blocking transformations - Blocking transformations must read and process all input records before creating any output records. Of all of the transformation types, these transformations perform the most work and can have the greatest impact on available resources. Example components in SSIS include Aggregate and Sort. Like partially blocking transformations, blocking transformations are also considered to be asynchronous. Similarly, when a blocking transformation is encountered in the data flow, a new buffer is created for its output and a new thread is introduced into the data flow. Parallelism – Packages, Tasks and Transformations can be executed in parallel
  • Row Transformations - Row transformations either manipulate data or create new fields using the data that is available in that row. Examples of SSIS components that perform row transformations include Derived Column, Data Conversion, Multicast, and Lookup. While these components might create new columns, row transformations do not create any additional records. Because each output row has a 1:1 relationship with an input row, row transformations are also known as synchronous transformations . Row transformations have the advantage of reusing existing buffers and do not require data to be copied to a new buffer to complete the transformation. Partially blocking transformations - Partially blocking transformations are often used to combine datasets. They tend to have multiple data inputs. As a result, their output may have the same, greater, or fewer records than the total number of input records. Since the number of input records will likely not match the number of output records, these transformations are also called asynchronous transformations . Examples of partially blocking transformation components available in SSIS include Merge, Merge Join, and Union All. With partially blocking transformations, the output of the transformation is copied into a new buffer and a new thread may be introduced into the data flow. Blocking transformations - Blocking transformations must read and process all input records before creating any output records. Of all of the transformation types, these transformations perform the most work and can have the greatest impact on available resources. Example components in SSIS include Aggregate and Sort. Like partially blocking transformations, blocking transformations are also considered to be asynchronous. Similarly, when a blocking transformation is encountered in the data flow, a new buffer is created for its output and a new thread is introduced into the data flow. Parallelism – Packages, Tasks and Transformations can be executed in parallel
  • First Example has a blocking shape, so no parallelism In second Example only destination is in parallel In Third example, everything is in parallel If SQL is your source, look carefully at aggregating in select statement
  • Demo 1 – use of Scripting to Infer a Dimension Explain Early Arriving Facts e.g. Sales arriving before customer.
  • Transcript

    • 1. SQL Server Integration Services Eduardo Castro MVP,MCDBA , MCSE, MCAD, MCSD [email_address] Comunidad Windows Costa Rica
    • 2.
      • SSIS Enhancements
      • SSIS Pros and Cons
      • Advanced Scripting
      • Optimisation for Scalability
      • Performance Monitoring
      • Interoperability (Excel/Oracle/Linux)
      What Will We Cover?
    • 3.
      • Basic knowledge of how to build SSIS data flows
      • Familiarity with scripting
      Helpful Experience Level 300
    • 4.
      • SSIS
      • Dataflow – Pros and Cons
      • Advanced Scripting
      • Optimisation for Scalability
      • Performance Monitoring
      • Interoperability (Excel/Oracle/Linux)
      Agenda
    • 5. SSIS Environment
      • Integrated with Visual Studio.
      • Integrated with Team Foundation Server.
      • Debugging!
      • Extensible - custom shapes and add-ins
      • Data Source View
      • Object and XML model
    • 6. SSIS – Control Flow
      • Similar to DTS but also:
        • Containers
        • Loops
        • Sequencing
        • Logging
        • Data Flow Task
    • 7. Architecture
    • 8. Data Flow Task – New Paradigm Source Staging Prep Extract Transform DWH Load Source Source-Transform-Destination (Dataflow) DWH SQL/DTS Disk Based Approach SSIS RAM Based Approach
    • 9. What Is SQL Server Integration Services? Platform for ETL operations Control flow engine and data flow engine
    • 10. Common Uses for Integration Services Import and export data Integrate heterogeneous data Clean and standardize data Support BI solutions
    • 11. Fundamental Integration Services Concepts Package Control flow Data flow Variable Event handler
    • 12. Integration Services Architecture Integration Services service Object model Integration Services runtime Data flow engine
    • 13. Business Intelligence Development Studio Five tabs in SSIS Designer
      • Control Flow
      • Data Flow
      • Event Handlers
      • Package Explorer
      • Progress/Execution Results
    • 14. SQL Server Management Studio Run Integration Services packages Monitor running Integration Services packages Manage Integration Services packages Import and export Integration Services packages
    • 15. Integration Services Wizards SQL Server Import and Export Wizard Package Installation Wizard Package Configuration Wizard Package Migration Wizard
    • 16. Command Prompt Utilities Execute Package Utility (dtexecui) Dtexec utility Dtutil utility
    • 17. Dataflow v SQL – Pros and Cons
      • RAM v Disk Argument
      • Data Flow is fantastic for: workflow, error handling, lookups, calculations, readability, instrumentation interoperability and inserts.
      • Consider SQL for bulk updates, deletes
      • Consider bcp, bulk insert or select into for simple imports
      • Consider development time – t-sql can be faster to develop
    • 18. Demo
      • Data Flow Dive
        • OLEDB Source/Destination
        • Lookups v SQL Joins v Merge Joins
        • Control Flow with Scripting intro
      demonstration
    • 19.
      • SSIS Enhancements (what's new)
      • Dataflow – Pros and Cons
      • Advanced Scripting
      • Optimisation for Scalability
      • Performance Monitoring
      • Interoperability (Excel/Oracle/Linux)
      Agenda
    • 20.
      • SSIS Enhancements (what's new)
      • Dataflow – Pros and Cons
      • Advanced Scripting
      • Optimisation for Scalability
      • Performance Monitoring
      • Interoperability (Excel/Oracle/Linux)
      Agenda
    • 21.
      • Any dot.net compatible language
      • Three types:
          • Source
          • Transformation
          • Destination
      • Careful of performance!
      • Script component
        • Key component for implementing custom scenarios
    • 22. Scripting Transform
      • PreExecute, ProcessInputRow and PostExecute events
      Public Class ScriptMain Inherits UserComponent Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer) ' ' Add your dot.net code here to perform row by row or column by column operation. ' ARE YOU NUTS !! If Row.MyField_IsNull Then 'Process End If End Sub Public Overrides Sub PreExecute() MyBase.PreExecute() 'Add code here to perform tasks before row processing. # ' Eg Prepare a stored Proc End Sub Public Overrides Sub PostExecute() 'Clean up objects. Eg That stored proc you prepared earlier MyBase.PostExecute() End Sub End Class
    • 23. Demo
      • Advanced Scripting
        • Deeper Look Fact Handling
        • Early Arriving Facts / Inferred Dim Problem
        • Off topic look at joys of table partitioning ;-)
      demonstration
    • 24.
      • SSIS Enhancements (what's new)
      • Dataflow – Pros and Cons
      • Advanced Scripting
      • Optimisation for Scalability
      • Performance Monitoring
      • Interoperability (Excel/Oracle/Linux)
      Agenda
    • 25. Optimising for Scalability Tips
      • Row Transformations aka a synchronous transformations = Good
      • Blocking Transformations (eg Aggregate, sort) = bad
      • Partially Blocking aka synchronous transformations =sometimes ugly
      • On VLDB avoid using: Row by row processing, recordsets, scripting, data object variables, import column, SCD, memory restricted lookups.
      • Use Parallelism (example next slide)
      • If SQL is your source, use it for aggregating, casting, basic calcs and maybe renaming
    • 26. Optimising for Scalability Tips
      • SQL UDF's give you a performance hit but re-usability payoff may be worthwhile.
      • Don’t go overboard on packages: validation, dependencies and complexity will hurt.
      • Use OLEDB Destination with batch size of 10k
      • Stage any large updates or deletes (over 10k records)
      • Don’t bother messing with MaxBufferSize. 10k is the magic number.
      • Use a 64-bit server with 8 cores and 16+GB ram ;-)
    • 27. Parallelism Example
    • 28.
      • SSIS Enhancements (what's new)
      • Dataflow – Pros and Cons
      • Advanced Scripting
      • Optimisation for Scalability
      • Performance Monitoring
      • Interoperability (Excel/Oracle/Linux)
      Agenda
    • 29. SSIS Performance Monitoring
      • Use WMI for resource monitoring
      • Use MOM for enterprise stuff
      • Use SQL Logging for everything else.. sysdtslog90
    • 30. Demo
      • SSIS Performance Monitoring
        • sysdtslog90
        • analysis in sql server
        • Custom reporting with SSRS
      demonstration
    • 31.
      • SSIS Enhancements (what's new)
      • Dataflow – Pros and Cons
      • Advanced Scripting
      • Optimisation for Scalability
      • Performance Monitoring
      • Interoperability (Oracle/Linux)
      Agenda
    • 32. SSIS Interoperability
      • Data Sources for OLEDB, Excel, Flatfile, SSAS. ADO.Net
      • Dot.net for extensibility and Legacy API
      • Avoid using SSIS to insert into oracle/linux
    • 33. Contact Information
      • Eduardo Castro
      • [email_address]