Microsoft SQL Server Data Warehouses for SQL Server DBAs


Published on

This is my presentation for SQL Saturday Philly 2012. The topic is managing SQL Server data warehouses with a look at the SQL Server data warehouse landscape and the challenges that a DBA must prepare for in large DW workloads and BI solutions.

Published in: Technology, Business
    Are you sure you want to  Yes  No
    Your message goes here

Microsoft SQL Server Data Warehouses for SQL Server DBAs

  1. 1. Microsoft SQL Server DataWarehouses for SQL DBAsSQL Saturday Philly June 9, 2012
  2. 2. http://mssqldude.wordpress.com
  3. 3. Agenda•• − −• − −• −•• − −
  4. 4. Microsoft Data WarehousingOfferings Tier 1 Offerings Fast Track Data HP Business DW Parallel Data Enterprise Warehouse Appliance Warehouse Appliance for high end Data Scalable and reliable platform Reference Architectures offering An affordable SMP solution for Warehousing requiring highest for Data Warehousing on any best price performance for Data data warehousing on optimized scalability, performance or hardware Warehousing hardware complexity Ideal for data marts or small to Ideal for data marts or small to Ideal for small data marts or DWs Offers flexibility in hardware and mid-sized enterprise data mid-sized DWs with scan centric with scan centric workloads architecture warehouses (EDWs) workloads DW Appliance Reference Architectures Integrated Appliance Software only (Fully integrated Software and (Software and Hardware) (Software and Hardware) Hardware) Scale out data warehousing Scale up data warehousing Scale up data warehousing Scale up data warehousing with massively parallel processing (MPP) 10s of terabytes 4–80 terabytes Up to 5 terabytes 10s–100s of terabytes
  5. 5. Some Data Warehouses todayBig SANBig SMP ServerConnected together What’s wrong with this picture?
  6. 6. Answer: system out of balance This server can consume 12 GB/Sec of IO, but the SAN can only deliver 2 GB/Sec  Even when the SAN is dedicated to the SQL Data Warehouse, which it often isn’t Queries are slow  Despite significant investment in both Server and StorageResult: significant investment, not delivering performance
  7. 7. The Alternative: A Balanced System Design a server + storage configuration that can deliver all the IO bandwidth that CPUs can consume when executing a SQL Relational DW workload Avoid sharing storage devices among servers Avoid overinvesting in disk drives
  8. 8. SQL Server Fast Track Data WarehouseSolution to help customers and partnersaccelerate their data warehouse deployments A method for designing a cost-effective, balanced system for Data Warehouse workloads Reference hardware configurations developed in conjunction with hardware partners using this method Best practices for data layout, loading and management
  9. 9. Software: • SQL Server 2008 R2 Enterprise • Windows Server 2008 R2Configuration guidelines: • Physical table structures • Indexes • Compression • SQL Server settings • Windows Server settings • LoadingHardware: • Tight specifications for servers, storage and networking • ‘Per core’ building block
  10. 10. Core Fast Track Metrics• − − − −
  11. 11. System Benchmarking - MCR• − −• −• − 200MB/s per core
  12. 12. Establishing Fast Track MCR• − −• −
  13. 13. System Benchmarking - BCR• − −• Actual Miles Per Gallon•
  14. 14. Establishing Fast Track BCR• − − − −
  15. 15. Fast Track Reference Configurations2 Processor Configurations (5 – 20 TB, 2-3.7 GB/s)    4 Processor Configurations (20 – 40 TB, 3.5-7.5 GB/s)    8 processor Configurations (40 – 80 TB, 7.5-14 GB/s) 
  17. 17. Software configurationSQL Server Startup• −•
  18. 18. Software configurationTemp DB• − −• −•• − −
  19. 19. Software configurationTemp DB & TLOG• − − − −• − − −• − −
  20. 20. DW Server Baseline Configs• − − − − −• −
  21. 21. Fast Track Data Striping • FT Storage Enclosure Raid-1 Primary Data Log ARY01D1v01 ARY02D1v03 ARY03D1v05 ARY04D1v07 ARY05v09 DB1-1.ndf DB1-5.ndf DB1-7.ndf DB1.ldf DB1-3.ndfDisk 1 & 2 ARY01D2v02 ARY02D2v04 ARY03D2v06 ARY04D2v08 DB1-2.ndf DB1-4.ndf DB1-6.ndf DB1-8.ndf Microsoft Confidential
  22. 22. User Databases• − − −••• −
  23. 23. Transaction Log•••
  24. 24. LUN 1 LUN 2 LUN 3 LUN16 Permanent FG Permanant_DB Permanent_1.ndf Permanent_2.ndf Permanent_3.ndf Permanent_16.ndf Stage FGDatabase Stage Stage_1.ndf Stage_2.ndf Stage_3.ndf Stage_16.ndf Local Drive 1 TempDB TempDB.mdf (25GB) TempDB_02.ndf (25GB) TempDB_03ndf (25GB) TempDB_16.ndf (25GB) Log LUN 1 Permanent DB Log Stage DB Log
  25. 25. Control rack Data racks Control Rack Data Rack Compute Nodes Storage Nodes Control Nodes SQL Active / Passive SQL SQL SQL SQLManagement Nodes Dual Fiber Channel SQL Dual Infiniband SQL SQL Landing Node SQL SQL Backup Node SQL Spare Compute Node Private Network
  26. 26. 1 Data Rack• 17 Servers• 22 Procs• 132 Cores Control Rack DataRack Expand to 4 data racks and quadruple your performance and capacity!
  27. 27. Query Speed in Seconds PDW Time Orig. Time4500 4200400035003000250020001500 1200 12001000 500 16 6 2 120 2 120 2 120 4 0 Q1 Q2 Q3 Q4 Q5 Q6 263x 200x 60x 60x 60x 300x PDW times faster than original query speeds
  28. 28. Parallel Data Warehouse Appliance Hardware Architecture Compute Nodes Storage Nodes Control Nodes SQL Active/Passive SQL SQL Client Drivers SQL SQL Management Nodes SQL Dual Fiber Channel Data Center Dual Infiniband SQL Monitoring SQL Landing Node SQL ETL Load Interface SQL Backup Node SQL Corporate Backup Solution Spare Compute NodeCorporate Network Private Network
  29. 29. Parallel Data Warehouse benefits Massively Parallel Processing Compute Nodes Storage Nodes Control Nodes ? SQL Active/Passive Query 1 is Query 1 ? SQL submitted to SQL Server SQL ? SQL on Control Node ? SQL Management Nodes ? SQL Dual Fiber Channel Query is Dual Infiniband ? SQL executed on all 10 Nodes ? SQL Landing Node ? SQL Results are sent back to ? SQL client Backup Node ? SQL Spare Compute NodeCorporate Network Private Network
  30. 30. Parallel Data Warehouse benefits Massively Parallel Processing Compute Nodes Storage Nodes Control Nodes Multiple ???????? SQL queries are ? Active/Passive ???????? SQL simultane- ? ???? SQL ???????? SQL ously ??? executed ? ? across all ???????? SQL nodes. ? Management Nodes ???????? SQL Dual Fiber Channel Dual Infiniband ???????? SQL ? ???????? SQL PDW supports ? Landing Node ???????? SQL querying ???????? while SQL data is ? ???????? loading. ? Backup Node SQL Spare Compute Node Blazing fast performance by parallelizing queries on highly optimizedCorporate Network Private Network shared nothing nodes
  31. 31. ••• − −
  32. 32. MPP Engine CoordinatorSoftware Architecture Provides single system image SQL compilation Global metadata and appliance configuration Global query optimization and plan generation Global query execution coordination Other Global transaction coordinationQuery MS BI Internet Authentication and authorization DWSQL Third- ExplorerTool (AS, RS) Supportability (hardware and software status) Party Tools Compute Node Compute Nodes Compute Nodes IIS Data Movement Service Data Access Admin (OLEDB, ODBC, ADO.NET, JDBC) Console User Data SQL Server Core SQL DMS Engine Parser Manager Data Backup Node Services Movement MPP Engine Coordinator Service Data Movement Service Landing Zone Node DW DW DW Data Movement Service TempDB Authentication Configuration Schema SQL Server Data Movement ServiceControl Node Data movement across the appliance Distributed query execution operators
  33. 33. Blazing-Fast Performance“400 percentimprovement inperformance First American Title Insurance Company Now, up to 10xFaster³ ColumnStore¹Source: Microsoft customer evidence, Choice Hotels International²Source: Microsoft customer evidence, KAS Bank³Source: Microsoft customer testing; common data warehousing queries
  34. 34. ProductKey SalesAmount OrderDateKey OrderDateKey ProductKey SalesAmount 20101107 106 30.00 20101107StoreKey RegionKey Quantity 103 17.00 2010110701 1 6 109 20101107 20.00 2 1 10304 20101107 17.00 2 2 10604 20101108 2 20.00 1 10603 3 OrderDateKey 25.00 405 1 20101108 ProductKey 502 20101108 SalesAmount 102 RegionKey Quantity 20101108 106 14.00StoreKey 1 1 20101109 109 25.0002 2 5 20101109 1 106 10.0003 20101109 1 10601 2 20.00 4 103 204 25.00 1 504 1 17.0001
  35. 35. 41• Batch object• Column vectors• List of qualifying rows − −•
  36. 36. In a standard scale-out server deployment, multiple report servers share a singlereport server database. The report server database should be installed on aremote SQL Server instance. The following diagram is an example of a standardscale-out server deployment configuration with the report server database on aremote SQL Server instance.
  37. 37. As another option, you might decide to host the report server database on aSQL Server instance that is part of a failover cluster. The following diagram isan example of a scale-out server deployment configuration where the reportserver databases are on an instance that is part of a failover cluster.
  38. 38. In addition to the standard scale-out deployment, you might determine that your reporting environmentwould benefit from a more advanced scale-out deployment configuration. For example, you might decideto use the load-balanced report servers for interactive report processing and add a separate report servercomputer to process only scheduled reports. The following diagram is an example of this advanced scale-out server deployment configuration.
  39. 39. Log Description The report server execution log contains data about specific reports, including when a report was run,Report Server Execution Log who ran it, where it was delivered, and which rendering format was used. The execution log is stored in the report server database. The service trace log contains very detailed information that is useful if you are debugging anReport Server Service Trace Log application or investigating an issue or event. The file is located at Microsoft SQL Server<SQL Server Instance>Reporting ServicesLogFiles. The HTTP log file contains a record of all HTTP requests and responses handled by the Report Server Web service and Report Manager. HTTP logging is not enabled by default. You must modify theReport Server HTTP Log ReportingServicesService.exe configuration file to use this feature in your installation. The file is located at Microsoft SQL Server<SQL Server Instance>Reporting ServicesLogFiles.
  40. 40. • −•••••
  41. 41. • − − − − −•••••••••••••
  42. 42. •••••••
  43. 43. Under the properties of your data source, increasing the network packet size for SQLServer minimizes the protocol overhead require to build many, small packages. Thedefault value for SQL Server 2008 is 4096. With a data warehouse load, a packet size of32K (in SQL Server, this means assigning the value 32767) can benefit processing. Don’tchange the value in SQL Server using sp_configure; instead override it in your data source.This can be set whether you are using TCP/IP or Shared Memory.
  44. 44. •••••••••••••••
  45. 45. • −• − − −••••
  46. 46. © 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.