Using PowerShell to Simplify your ETL
Upcoming SlideShare
Loading in...5
×
 

Using PowerShell to Simplify your ETL

on

  • 4,731 views

 

Statistics

Views

Total Views
4,731
Views on SlideShare
4,710
Embed Views
21

Actions

Likes
0
Downloads
30
Comments
0

1 Embed 21

http://www.slideshare.net 21

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Using PowerShell to Simplify your ETL Using PowerShell to Simplify your ETL Presentation Transcript

    • Scott Weinstein
      Principal
      Lab49
      weblogs.asp.net/sweinstein / @ScottWeinstein
      Using PowerShell to simplify your ETL
    • GOLD SPONSORS
    • SILVER SPONSOR
    • BRONZE SPONSORS
    • OTHER SPONSORS
    • SSIS Considered Harmful
      Visual programming is a terrible interface for ETL work
      History of UI visual designers is instructive here
      No way to copy/share “code samples”
      Low information density
      Even worse
      It’s not even a good designer
      Poor defaults, poor layout
      Nested menus of property grids are hard to use
      No real VCS ability
      Sure it’s XML, but…
      No diff
      No merge
      Demo ?
    • But what about all the features?
      Some are trivial:
      Foreach Loop Container
      Execute SQL Task
      Send Mail Task
      OMG!
      Many are useful, but not hard to recreate
      Demos soon
      Others may be worth consideration
      Fuzzy matching
    • But what about performance
      The advantages for data flow are real, but not that large
      A 5M row test on my home machine gave a 12% performance advantage to SSIS over PowerShell + SqlBulkCopy
    • The Answer
      PowerShell
      A general purpose scripting and automation language
      Wide adoption
      MS: Windows Server, Windows 7, SQL Server 2008, VM Manager, IIS, Deployment Toolkit
      Others: VMWare, Quest, IBM Websphere
    • PowerShell Features
      Takes the best of Unix, VMS, Perl
      Adds in integration with .Net, COM, WMI
      Adds in
      REPL environment
      Full scripting/programming
      Flexible typing model
      Pipes
      Regular expressions
      XML
      Navigation Providers
      Built in command parser
      Standardized syntax for commands
    • Introducing PSIS
      http://psis.codeplex.com/
      Yes, the name is derivative
      Right now, just a playground for some demos
      Bulk Data Transfer
      Star-Schema Populator
      Execute SQL
    • PSIS demo
      Just the ETL
      No logging, error reporting, etc
      Some typical ETL tasks
      Clear staging tables
      Bulk transfer data from one DB to another
      Populate a star schema
      Fact table and dimension tables
      Extract data to a .csv file
    • Clear staging tables
      Use Invoke-MSSqlCommand
      Use Truncate <tablename>
    • Bulk transfer
      At core, we’ll use .Net’sSqlBulkCopy
      under the covers uses TDS’s BCP protocol
      For concurrency we’ll use the Task Parallel Library
      We’ll simplify things by making the destination tables map directly to the source tables
      Select statements could provide needed mapping
    • Populate a Star-Schema
      Destination
    • Populate a Star-Schema
      Source a view on the staging tables, creating a flat de-normalized row of all dimension and fact value
      Use convention to indicate which columns are fact or dimension tables
    • Populate a Star-Schema
      Code generate a populator
      Create functions, given a source row which return/create the corresponding dimension key
      Run it with the TPL for performance
    • Populate a Star-Schema
      Areas of improvement
      Add support for slowly changing dimensions
      Add support for lower memory requirements
      query DB for dimension values on demand
      Measure and tune performance at the micro-benchmark level
      Concurrent dictionaries
      Partial loads
    • Extract to file
      Invoke-MSSqlCommand
      Export-Csv
    • Others?