Microsoft SQL Server Data Warehouses for SQL Server DBAs


Published on

This is my presentation for SQL Saturday Philly 2012. The topic is managing SQL Server data warehouses with a look at the SQL Server data warehouse landscape and the challenges that a DBA must prepare for in large DW workloads and BI solutions.

Published in: Technology, Business
1 Comment
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • This slide shows what we are going to talk about today. We will start off discussing Microsoft’s vision for data warehousing solutions. Then we will discuss the different offerings. Next, we will discuss how you can get support and services to help you get started with your data warehouse and to help accelerate the completion of your solution. Finally, we will end with a discussion of the quick start services to enable you to begin your data warehouse solution quickly.
  • SQL Server 2008 R2 comes in several editions. In this presentation, we will look at 4 different SKUs, each of which has different features that are important for data warehousing. We will drill down to get more information about each edition and the features that are important.
  • Remind them
  • In order to ensure the query is cached you need to do the following:Ensure the results of the query will fit in memoryRun the query once. The 2nd and subsequent times you execute the query it should be cached from memory. You can tell this b/c the 2nd execution should be much faster than the initialReview:TPC BENCHMARKTM H Data Set
  • Remind them “Your mileage may vary”
  • -E is the primary way we help to ensure longer “runs” of contiguous, logically grouped pages.An extent is (8) 8k pages.. Or 64k (64k*64k)/1024 = 4MBSQL will still allocate the 4MB extent in groups of (8) 8k pages at a time. This means that pages can still be interleaved (extent fragmentation) down to the extent level.TF117 is specific to TempDB as Autogrow should be off for all other databasesCustomer may have a database with a specific use case that requires autogrow..this is ok just needs to be managedShould not be a major part of the overall workload. This file will become fragmentedUsing Autogrow for Tempdb is about practicality. It can be hard to pre-allocated TempDB. If they can pre-allocate it, go for itReview:Using the SQL Server Service Startup Options with Microsoft SQL Server 2005: Best Practices for High Availability, Maximum Performance, and Scalability
  • Remember that additional space may be needed during initial migration of data if moving onto a Fast Track RA or during the initial load of a new Fast Track RAReview:Working with tempdb in SQL Server 2005 Planning for tempdb
  • Remember that additional space may be needed during initial migration of data if moving onto a Fast Track RA or during the initial load of a new Fast Track RAReview:Working with tempdb in SQL Server 2005 Planning for tempdb
  • Workloads often need large amounts of data pages to be in cache, in this case add additional memory as neededHash Joins and Sorts can make use of additional memory to help prevent them from spilling to tempdb. Workloads with large amounts of queries and bulk loads performing hash joins and sorts will benefit from more memory.Review:Troubleshooting Performance Problems in SQL Server 2008 to: Enable the Lock Pages in Memory Option options for SQL Server 2005 and SQL Server 2008 when running in high performance workloads
  • 4 Racks in V1Orderable at the rack levelRequired software13k Price per TB Pricing and licensing training in resources
  • Data layout options:Dimension tables are typically replicated.PDW maintains data integrity across all nodes.Fact tables are typically distributed.The data model, table sizes, and workloads must all be considered when choosing between replicated and distributed tables.The following join types are used to achieve Distribution Compatibility:Shared Nothing join - Achieves Distribution Compatibility by using compatible Distribution Keys in the SQL join criteria.Ultra Shared Nothing join - Achieves Distribution Compatibility through a replicated table; no data movement between nodes is required.Redistribution join - Requires data to be dynamically distributed between Compute Nodes to achieve Distribution Compatibility.
  • Microsoft SQL Server Data Warehouses for SQL Server DBAs

    1. 1. Microsoft SQL Server DataWarehouses for SQL DBAsSQL Saturday Philly June 9, 2012
    2. 2. http://mssqldude.wordpress.com
    3. 3. Agenda•• − −• − −• −•• − −
    4. 4. Microsoft Data WarehousingOfferings Tier 1 Offerings Fast Track Data HP Business DW Parallel Data Enterprise Warehouse Appliance Warehouse Appliance for high end Data Scalable and reliable platform Reference Architectures offering An affordable SMP solution for Warehousing requiring highest for Data Warehousing on any best price performance for Data data warehousing on optimized scalability, performance or hardware Warehousing hardware complexity Ideal for data marts or small to Ideal for data marts or small to Ideal for small data marts or DWs Offers flexibility in hardware and mid-sized enterprise data mid-sized DWs with scan centric with scan centric workloads architecture warehouses (EDWs) workloads DW Appliance Reference Architectures Integrated Appliance Software only (Fully integrated Software and (Software and Hardware) (Software and Hardware) Hardware) Scale out data warehousing Scale up data warehousing Scale up data warehousing Scale up data warehousing with massively parallel processing (MPP) 10s of terabytes 4–80 terabytes Up to 5 terabytes 10s–100s of terabytes
    5. 5. Some Data Warehouses todayBig SANBig SMP ServerConnected together What’s wrong with this picture?
    6. 6. Answer: system out of balance This server can consume 12 GB/Sec of IO, but the SAN can only deliver 2 GB/Sec  Even when the SAN is dedicated to the SQL Data Warehouse, which it often isn’t Queries are slow  Despite significant investment in both Server and StorageResult: significant investment, not delivering performance
    7. 7. The Alternative: A Balanced System Design a server + storage configuration that can deliver all the IO bandwidth that CPUs can consume when executing a SQL Relational DW workload Avoid sharing storage devices among servers Avoid overinvesting in disk drives
    8. 8. SQL Server Fast Track Data WarehouseSolution to help customers and partnersaccelerate their data warehouse deployments A method for designing a cost-effective, balanced system for Data Warehouse workloads Reference hardware configurations developed in conjunction with hardware partners using this method Best practices for data layout, loading and management
    9. 9. Software: • SQL Server 2008 R2 Enterprise • Windows Server 2008 R2Configuration guidelines: • Physical table structures • Indexes • Compression • SQL Server settings • Windows Server settings • LoadingHardware: • Tight specifications for servers, storage and networking • ‘Per core’ building block
    10. 10. Core Fast Track Metrics• − − − −
    11. 11. System Benchmarking - MCR• − −• −• − 200MB/s per core
    12. 12. Establishing Fast Track MCR• − −• −
    13. 13. System Benchmarking - BCR• − −• Actual Miles Per Gallon•
    14. 14. Establishing Fast Track BCR• − − − −
    15. 15. Fast Track Reference Configurations2 Processor Configurations (5 – 20 TB, 2-3.7 GB/s)    4 Processor Configurations (20 – 40 TB, 3.5-7.5 GB/s)    8 processor Configurations (40 – 80 TB, 7.5-14 GB/s) 
    17. 17. Software configurationSQL Server Startup• −•
    18. 18. Software configurationTemp DB• − −• −•• − −
    19. 19. Software configurationTemp DB & TLOG• − − − −• − − −• − −
    20. 20. DW Server Baseline Configs• − − − − −• −
    21. 21. Fast Track Data Striping • FT Storage Enclosure Raid-1 Primary Data Log ARY01D1v01 ARY02D1v03 ARY03D1v05 ARY04D1v07 ARY05v09 DB1-1.ndf DB1-5.ndf DB1-7.ndf DB1.ldf DB1-3.ndfDisk 1 & 2 ARY01D2v02 ARY02D2v04 ARY03D2v06 ARY04D2v08 DB1-2.ndf DB1-4.ndf DB1-6.ndf DB1-8.ndf Microsoft Confidential
    22. 22. User Databases• − − −••• −
    23. 23. Transaction Log•••
    24. 24. LUN 1 LUN 2 LUN 3 LUN16 Permanent FG Permanant_DB Permanent_1.ndf Permanent_2.ndf Permanent_3.ndf Permanent_16.ndf Stage FGDatabase Stage Stage_1.ndf Stage_2.ndf Stage_3.ndf Stage_16.ndf Local Drive 1 TempDB TempDB.mdf (25GB) TempDB_02.ndf (25GB) TempDB_03ndf (25GB) TempDB_16.ndf (25GB) Log LUN 1 Permanent DB Log Stage DB Log
    25. 25. Control rack Data racks Control Rack Data Rack Compute Nodes Storage Nodes Control Nodes SQL Active / Passive SQL SQL SQL SQLManagement Nodes Dual Fiber Channel SQL Dual Infiniband SQL SQL Landing Node SQL SQL Backup Node SQL Spare Compute Node Private Network
    26. 26. 1 Data Rack• 17 Servers• 22 Procs• 132 Cores Control Rack DataRack Expand to 4 data racks and quadruple your performance and capacity!
    27. 27. Query Speed in Seconds PDW Time Orig. Time4500 4200400035003000250020001500 1200 12001000 500 16 6 2 120 2 120 2 120 4 0 Q1 Q2 Q3 Q4 Q5 Q6 263x 200x 60x 60x 60x 300x PDW times faster than original query speeds
    28. 28. Parallel Data Warehouse Appliance Hardware Architecture Compute Nodes Storage Nodes Control Nodes SQL Active/Passive SQL SQL Client Drivers SQL SQL Management Nodes SQL Dual Fiber Channel Data Center Dual Infiniband SQL Monitoring SQL Landing Node SQL ETL Load Interface SQL Backup Node SQL Corporate Backup Solution Spare Compute NodeCorporate Network Private Network
    29. 29. Parallel Data Warehouse benefits Massively Parallel Processing Compute Nodes Storage Nodes Control Nodes ? SQL Active/Passive Query 1 is Query 1 ? SQL submitted to SQL Server SQL ? SQL on Control Node ? SQL Management Nodes ? SQL Dual Fiber Channel Query is Dual Infiniband ? SQL executed on all 10 Nodes ? SQL Landing Node ? SQL Results are sent back to ? SQL client Backup Node ? SQL Spare Compute NodeCorporate Network Private Network
    30. 30. Parallel Data Warehouse benefits Massively Parallel Processing Compute Nodes Storage Nodes Control Nodes Multiple ???????? SQL queries are ? Active/Passive ???????? SQL simultane- ? ???? SQL ???????? SQL ously ??? executed ? ? across all ???????? SQL nodes. ? Management Nodes ???????? SQL Dual Fiber Channel Dual Infiniband ???????? SQL ? ???????? SQL PDW supports ? Landing Node ???????? SQL querying ???????? while SQL data is ? ???????? loading. ? Backup Node SQL Spare Compute Node Blazing fast performance by parallelizing queries on highly optimizedCorporate Network Private Network shared nothing nodes
    31. 31. ••• − −
    32. 32. MPP Engine CoordinatorSoftware Architecture Provides single system image SQL compilation Global metadata and appliance configuration Global query optimization and plan generation Global query execution coordination Other Global transaction coordinationQuery MS BI Internet Authentication and authorization DWSQL Third- ExplorerTool (AS, RS) Supportability (hardware and software status) Party Tools Compute Node Compute Nodes Compute Nodes IIS Data Movement Service Data Access Admin (OLEDB, ODBC, ADO.NET, JDBC) Console User Data SQL Server Core SQL DMS Engine Parser Manager Data Backup Node Services Movement MPP Engine Coordinator Service Data Movement Service Landing Zone Node DW DW DW Data Movement Service TempDB Authentication Configuration Schema SQL Server Data Movement ServiceControl Node Data movement across the appliance Distributed query execution operators
    33. 33. Blazing-Fast Performance“400 percentimprovement inperformance First American Title Insurance Company Now, up to 10xFaster³ ColumnStore¹Source: Microsoft customer evidence, Choice Hotels International²Source: Microsoft customer evidence, KAS Bank³Source: Microsoft customer testing; common data warehousing queries
    34. 34. ProductKey SalesAmount OrderDateKey OrderDateKey ProductKey SalesAmount 20101107 106 30.00 20101107StoreKey RegionKey Quantity 103 17.00 2010110701 1 6 109 20101107 20.00 2 1 10304 20101107 17.00 2 2 10604 20101108 2 20.00 1 10603 3 OrderDateKey 25.00 405 1 20101108 ProductKey 502 20101108 SalesAmount 102 RegionKey Quantity 20101108 106 14.00StoreKey 1 1 20101109 109 25.0002 2 5 20101109 1 106 10.0003 20101109 1 10601 2 20.00 4 103 204 25.00 1 504 1 17.0001
    35. 35. 41• Batch object• Column vectors• List of qualifying rows − −•
    36. 36. In a standard scale-out server deployment, multiple report servers share a singlereport server database. The report server database should be installed on aremote SQL Server instance. The following diagram is an example of a standardscale-out server deployment configuration with the report server database on aremote SQL Server instance.
    37. 37. As another option, you might decide to host the report server database on aSQL Server instance that is part of a failover cluster. The following diagram isan example of a scale-out server deployment configuration where the reportserver databases are on an instance that is part of a failover cluster.
    38. 38. In addition to the standard scale-out deployment, you might determine that your reporting environmentwould benefit from a more advanced scale-out deployment configuration. For example, you might decideto use the load-balanced report servers for interactive report processing and add a separate report servercomputer to process only scheduled reports. The following diagram is an example of this advanced scale-out server deployment configuration.
    39. 39. Log Description The report server execution log contains data about specific reports, including when a report was run,Report Server Execution Log who ran it, where it was delivered, and which rendering format was used. The execution log is stored in the report server database. The service trace log contains very detailed information that is useful if you are debugging anReport Server Service Trace Log application or investigating an issue or event. The file is located at Microsoft SQL Server<SQL Server Instance>Reporting ServicesLogFiles. The HTTP log file contains a record of all HTTP requests and responses handled by the Report Server Web service and Report Manager. HTTP logging is not enabled by default. You must modify theReport Server HTTP Log ReportingServicesService.exe configuration file to use this feature in your installation. The file is located at Microsoft SQL Server<SQL Server Instance>Reporting ServicesLogFiles.
    40. 40. • −•••••
    41. 41. • − − − − −•••••••••••••
    42. 42. •••••••
    43. 43. Under the properties of your data source, increasing the network packet size for SQLServer minimizes the protocol overhead require to build many, small packages. Thedefault value for SQL Server 2008 is 4096. With a data warehouse load, a packet size of32K (in SQL Server, this means assigning the value 32767) can benefit processing. Don’tchange the value in SQL Server using sp_configure; instead override it in your data source.This can be set whether you are using TCP/IP or Shared Memory.
    44. 44. •••••••••••••••
    45. 45. • −• − − −••••
    46. 46. © 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.