SlideShare a Scribd company logo
1 of 18
Blazing SSIS!
       High Performance Design Techniques




             Bhavik Merchant

bhavik.merchant@gmail.com bhavik.merchant@csg.com.au
Tweet, tweet..
• Twitter: @BhavikMerchant
• HashTag: #SQLPASS
Agenda in a Nutshell
•   Introductions
•   Rationale
•   SSIS internal architecture, tuning approach
•   Source, Transforms, Destination
•   Updates, Data Flow tweaks
•   Advanced approaches
•   Tips
•   Testing approach
•   Q&A
Brief Speaker Intro
• Background
  – Certified End-to-End Microsoft BI practitioner
  – Team Lead and Consultant at CSG
  – Microsoft vTSP for BI
  – Trainer (SSAS, SSIS, SSRS, PowerPivot, Sharepoint
    BI)
• Experience
  – Variety of BI projects in from SQL 2000 - 2008R2,
    MOSS 2007 - SP2010 over the past 6+ years
Why this topic?
• Many ways to skin a cat
  – SSIS vs SQL e.g. “transforms”
  – Sometimes multiple approaches even within SSIS
• Some settings are overlooked
• Some settings are obscure
• Will see examples as we go along
SSIS Tuning – Setting the Stage
• SSIS Architecture
  – In-memory engine. But spooling can occur.
  – Pipeline, Buffers. Want to minimise size and
    number

• SSIS Pipeline
  – We’ll follow this from source to destination
  – Good way to tune your packages
  – Isolate from downstream using
    RowCount/Multicast
Examining the Pipeline - Source
• Tune the source
  – I thought we were inserting… Ok then, why?

• How?
  –   Reduce number of rows – obvious
  –   Reduce number of columns – buffers
  –   Reduce column width – buffers again
  –   SQL command instead of Table/View
  –   FastParse on flat files (only numerics and date/time)

• DEMO!
Examining the Pipeline - Transforms
Examining the Pipeline - Transforms
• Also needs design consideration/tuning

• Similar principles to Source apply
    – Reduce number of rows (Cond split)
    – Reduce number of columns
    – Perform Transforms in source (Sort, Aggregate, Trim)

• Synchronous
    – Streaming, Row-based
• .. vs Asynchronous
    – Partially Blocking, Blocking (beware – memory!)

• Buffers, pointer passing, creation

• DEMO will explain the Dam!
Examining the Pipeline - Destination
• Choice of Destination component
   – Fastest is “SQL Server”, with limitation

• Fast Load with OLEDB Destinaton – batch based

• Tablock ON, Check constraints OFF

• Maximum insert commit size
   – 0 for heaps
   – 10k to 1m for B-Trees (table with clustered index)
   – Default value means commit whole batch. May lead to locks, large
     transaction log

• Indexes – Drop then re-create after

• DEMO!
Updates – where SQL shines
• Only mechanism for updates within a data flow is the
  OLEDB Command transform

• Its synchronous and row-by-row
   – So this is good right?
   – No  An UPDATE statement issued for each row!

• Instead, lean on SQL Server for a set based approach
  via temp tables and the Execute SQL Task

• DEMO!
Tweaking the Data Flow Component
• Adjust data flow default buffer size
   – DefaultBufferMaxRows
   – DefaultBufferSize (bytes)
   – EngineThreads, incrementally

• Beware of oversizing buffers – spooling
   – BufferTempStoragePath
   – BLOBTempStoragePath

• DEMO!
Taking it further…
• Raw Files for transformation isolation

• Table Partitioning
   – Load work partition, then switch
   – Parallel parameter based loads
   – can also align with physical storage

• Balanced Data Distributor –source is not bottleneck. Good for heaps.

• Adjust MaxConcurrentExecutables at package level. Default (-1) =
  Logical Cores + 2

• Network packet size – default 4096 (4k). Can increase in the
  connection string. Recommend 32767 (32k) for less network
  overhead
General Tips
• Use views or stored procs in sources where
  possible

• Database Compression = less I/O

• Available Server Memory
  – NOT part of the SQL Server DBMS Memory Pool
  – Rule of thumb:
     • Host OS upto 1.5Gb
     • SQL DBMS, SSRS, SSAS all have separate allocations
     • SSIS uses whats left - Important for large cached Lookups
Benchmarking/Testing
• Initial build, benchmark

• Do at least 3 tests then average - Excel

• Change one thing at a time! Then retest, record results

• Testing – BIDS is not a true representation. Neither is
  your workstation.

• Can use Perfmon to monitor SSIS counters. Combine
  with SSIS log events (e.g. PipelineComponentTime) and
  DMVs – especially sys.dm_os_wait_stats
Summary
• Start from source and work through to
  destination
• Isolate from downstream or upstream
  components
• Benchmark before you start tuning
• Resist the urge to change many things at
  once
• Create a development/tuning checklist
Thanks for listening….

QUESTION AND ANSWER
Related Links
• http://sqlblog.com/blogs/jamie_thomson
  Jamie Thomson “SSIS Junkie”
• http://toddmcdermid.blogspot.com
  Todd McDermid (Dimension Merge SCD)
• http://blogs.msdn.com/b/mattm/
  Matt Masson - SSIS Team Blog
• http://sqlcat.com , http://sql-server-
   performance.com
  Look for SSIS best practices
• http://bidshelper.codeplex.com
• http://dimensionmergescd.codeplex.com

More Related Content

Viewers also liked

Solid Waste Management Challengies for Cities in Developing Countries
 Solid Waste Management Challengies  for Cities in Developing Countries Solid Waste Management Challengies  for Cities in Developing Countries
Solid Waste Management Challengies for Cities in Developing Countries
shuaibumusa2012
 

Viewers also liked (11)

A publicidade
A publicidadeA publicidade
A publicidade
 
Gnerostextuais 100908104505-phpapp02
Gnerostextuais 100908104505-phpapp02Gnerostextuais 100908104505-phpapp02
Gnerostextuais 100908104505-phpapp02
 
A evolução do setor e o papel do profissional de atendimento
A evolução do setor e o papel do profissional de atendimentoA evolução do setor e o papel do profissional de atendimento
A evolução do setor e o papel do profissional de atendimento
 
Semana 4 D1
Semana 4 D1Semana 4 D1
Semana 4 D1
 
Genetics
GeneticsGenetics
Genetics
 
Shopware 2 netvlies
Shopware 2 netvliesShopware 2 netvlies
Shopware 2 netvlies
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Probabilidades D9
Probabilidades D9Probabilidades D9
Probabilidades D9
 
Criação Publicitária - Cargos e funções
Criação Publicitária - Cargos e funçõesCriação Publicitária - Cargos e funções
Criação Publicitária - Cargos e funções
 
Aula 5 atendimento e planejamento de campanha
Aula 5   atendimento e planejamento de campanhaAula 5   atendimento e planejamento de campanha
Aula 5 atendimento e planejamento de campanha
 
Solid Waste Management Challengies for Cities in Developing Countries
 Solid Waste Management Challengies  for Cities in Developing Countries Solid Waste Management Challengies  for Cities in Developing Countries
Solid Waste Management Challengies for Cities in Developing Countries
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

2012-03-29 (PASS BI Virtual Chapter) Blazing SSIS! High Performance Design Techniques

  • 1. Blazing SSIS! High Performance Design Techniques Bhavik Merchant bhavik.merchant@gmail.com bhavik.merchant@csg.com.au
  • 2. Tweet, tweet.. • Twitter: @BhavikMerchant • HashTag: #SQLPASS
  • 3. Agenda in a Nutshell • Introductions • Rationale • SSIS internal architecture, tuning approach • Source, Transforms, Destination • Updates, Data Flow tweaks • Advanced approaches • Tips • Testing approach • Q&A
  • 4. Brief Speaker Intro • Background – Certified End-to-End Microsoft BI practitioner – Team Lead and Consultant at CSG – Microsoft vTSP for BI – Trainer (SSAS, SSIS, SSRS, PowerPivot, Sharepoint BI) • Experience – Variety of BI projects in from SQL 2000 - 2008R2, MOSS 2007 - SP2010 over the past 6+ years
  • 5. Why this topic? • Many ways to skin a cat – SSIS vs SQL e.g. “transforms” – Sometimes multiple approaches even within SSIS • Some settings are overlooked • Some settings are obscure • Will see examples as we go along
  • 6. SSIS Tuning – Setting the Stage • SSIS Architecture – In-memory engine. But spooling can occur. – Pipeline, Buffers. Want to minimise size and number • SSIS Pipeline – We’ll follow this from source to destination – Good way to tune your packages – Isolate from downstream using RowCount/Multicast
  • 7. Examining the Pipeline - Source • Tune the source – I thought we were inserting… Ok then, why? • How? – Reduce number of rows – obvious – Reduce number of columns – buffers – Reduce column width – buffers again – SQL command instead of Table/View – FastParse on flat files (only numerics and date/time) • DEMO!
  • 8. Examining the Pipeline - Transforms
  • 9. Examining the Pipeline - Transforms • Also needs design consideration/tuning • Similar principles to Source apply – Reduce number of rows (Cond split) – Reduce number of columns – Perform Transforms in source (Sort, Aggregate, Trim) • Synchronous – Streaming, Row-based • .. vs Asynchronous – Partially Blocking, Blocking (beware – memory!) • Buffers, pointer passing, creation • DEMO will explain the Dam!
  • 10. Examining the Pipeline - Destination • Choice of Destination component – Fastest is “SQL Server”, with limitation • Fast Load with OLEDB Destinaton – batch based • Tablock ON, Check constraints OFF • Maximum insert commit size – 0 for heaps – 10k to 1m for B-Trees (table with clustered index) – Default value means commit whole batch. May lead to locks, large transaction log • Indexes – Drop then re-create after • DEMO!
  • 11. Updates – where SQL shines • Only mechanism for updates within a data flow is the OLEDB Command transform • Its synchronous and row-by-row – So this is good right? – No  An UPDATE statement issued for each row! • Instead, lean on SQL Server for a set based approach via temp tables and the Execute SQL Task • DEMO!
  • 12. Tweaking the Data Flow Component • Adjust data flow default buffer size – DefaultBufferMaxRows – DefaultBufferSize (bytes) – EngineThreads, incrementally • Beware of oversizing buffers – spooling – BufferTempStoragePath – BLOBTempStoragePath • DEMO!
  • 13. Taking it further… • Raw Files for transformation isolation • Table Partitioning – Load work partition, then switch – Parallel parameter based loads – can also align with physical storage • Balanced Data Distributor –source is not bottleneck. Good for heaps. • Adjust MaxConcurrentExecutables at package level. Default (-1) = Logical Cores + 2 • Network packet size – default 4096 (4k). Can increase in the connection string. Recommend 32767 (32k) for less network overhead
  • 14. General Tips • Use views or stored procs in sources where possible • Database Compression = less I/O • Available Server Memory – NOT part of the SQL Server DBMS Memory Pool – Rule of thumb: • Host OS upto 1.5Gb • SQL DBMS, SSRS, SSAS all have separate allocations • SSIS uses whats left - Important for large cached Lookups
  • 15. Benchmarking/Testing • Initial build, benchmark • Do at least 3 tests then average - Excel • Change one thing at a time! Then retest, record results • Testing – BIDS is not a true representation. Neither is your workstation. • Can use Perfmon to monitor SSIS counters. Combine with SSIS log events (e.g. PipelineComponentTime) and DMVs – especially sys.dm_os_wait_stats
  • 16. Summary • Start from source and work through to destination • Isolate from downstream or upstream components • Benchmark before you start tuning • Resist the urge to change many things at once • Create a development/tuning checklist
  • 18. Related Links • http://sqlblog.com/blogs/jamie_thomson Jamie Thomson “SSIS Junkie” • http://toddmcdermid.blogspot.com Todd McDermid (Dimension Merge SCD) • http://blogs.msdn.com/b/mattm/ Matt Masson - SSIS Team Blog • http://sqlcat.com , http://sql-server- performance.com Look for SSIS best practices • http://bidshelper.codeplex.com • http://dimensionmergescd.codeplex.com