• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Bi303 data warehousing with fast track and pdw - Assaf Fraenkel
 

Bi303 data warehousing with fast track and pdw - Assaf Fraenkel

on

  • 1,345 views

 

Statistics

Views

Total Views
1,345
Views on SlideShare
993
Embed Views
352

Actions

Likes
0
Downloads
16
Comments
0

1 Embed 352

http://www.sqlserver.co.il 352

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Bi303 data warehousing with fast track and pdw - Assaf Fraenkel Bi303 data warehousing with fast track and pdw - Assaf Fraenkel Presentation Transcript

    • Data Warehousing with FastTrack and PDW Assaf Fraenkel Oded ShihorLead Architect, MCS Senior Solution Architect, HP
    • ‫איזה מכונית כדאי לקנות?‬ ‫האם זו שאלה של מחיר?!‬
    • AgendaMotivationFast Track Offering – Balanced Architecture Approach for DW – Example FastTrack Reference Architectures – Optimizing Storage, Load and Maintenance – Case StudiesParallel Data Warehouse Offering Overview
    • Some SQL Data Warehouses todayBig SANBig SMP ServerConnected together What’s wrong with this picture?
    • Answer: system out of balance This server can consume 16 GB/Sec of IO, but the SAN can only deliver 2 GB/Sec – Even when the SAN is dedicated to the SQL Data Warehouse, which it often isn’t – Lots of disks for Random IOPS BUT – Limited controllers  Limited IO bandwidth System is typically IO bound Queries are slow Result: significant investment, not delivering performance
    • You can get more sophisticated…Realize that queries performing complex calculations,format conversions, multi-dimension hash joins, etc. will bemore cpu-intensive than others – Complex queries will consume data at a slower per-core rate than simpler queriesAlternative: Measure per-core data consumption for avariety of queries, and take the weighted average – A standard approach to capacity planning
    • Or you can leave it to us…We’ve measured a mix of TPCH queries that reflect a‘prototype’ Data Warehouse workloadConcluded that SQL Sever 2008 R2 on current x64 coresconsume ~200 MB/Sec per core on average for thisworkloadWe use this as a basis for the published referencearchitecturesYour mileage will vary! – For precise system sizing, measure your own workload
    • Potential Performance Bottlenecks S F C DISK DISK P W Q A C I L C FC A U HBA S N S A B LUN CACHE C W A STORAGE A SERVER D E C O O R H I B CONTROLLER B DISK DISK R FC A W V E T B E HBA B S E C LUN S R HCPU Feed Rate SQL Server HBA Port Rate Switch Port Rate SP Port Rate LUN Read Rate Disk Feed Rate Read Ahead Rate
    • The Alternative: A Balanced System Design a server + storage configuration that can deliver all the IO bandwidth that CPUs can consume when executing a SQL Relational DW workload Avoid sharing storage devices among servers Avoid overinvesting in disk drives – Focus on scan performance, not IOPS Layout and manage data to maximize range scan performance and minimize fragmentation
    • Microsoft Data Warehousing – Product Offering PDW with Scale Hub-and-spoke 1 Minimal HW tune Complexity 4 up/optimization. Supports HA by default mixed workloads SW-HW integration 3 2 Balanced solution for mostly scan centric workloads. PDW 3 Max HW tune up for most DW scenarios. SQL Server 2008 R2 4 Most flexible Architecture for with Fast Track handling all DW scenarios. Reference Architecture 2 SQL Server 2008 R2 1
    • AgendaMotivationFast Track Offering – Balanced Architecture Approach for DW – Example FastTrack Reference Architectures – Optimizing Storage, Load and MaintenanceParallel Data Warehouse Offering Overview
    • SQL Server Fast Track Data WarehouseSolution to help customers and partners accelerate their data warehouse deployments A for designing a cost-effective, balanced system for Data Warehouse workloads Reference hardware developed in conjunction with hardware partners using this method for data layout, loading and management Relational Database Only – Not SSAS, IS, RS
    • Fast Track Scope Supporting Systems BI Data Storage Systems Presentation Layer Systems Integration Analysis Services Services ETL Cubes Presentation Data Presentation Data Web Analytic Tools Data Path Reporting Services Subject Area Data Marts SharePoint Services SAN, Storage Array Microsoft Office SharePoint Data Warehouse PerformancePoint Data Staging, Excel Services Bulk Loading Reference Architecture Scope (dashed)
    • HP Fast Track DL785 G6 Demo
    • Fast Track SQL DW Architecture vs. Traditional DWTraditional SQL DW Architecture Fast Track SQL DW ArchitectureShared Infrastructure Dedicated DW Infrastructure Architecture modeled after DW Appliances Scalability from 4TB to 80TBEnterprise Shared Shared Network Dedicated NetworkSAN Storage Bandwidth Bandwidth SQL 2008 Data Warehouse Dedicated Low Cost 4 Processor 16 + Core Server SAN Arrays 1 for every 4 CPU Cores Benefits: OLTP Applications -Lower TCO -Balanced CPU to I/O Channel Optimized for DW -Modular Building Block Approach -Scale Out or Up within limits of Server and San
    • HP SQL Server Fast Track Data WarehousingFast Track G7 Configurations Coming soon Scales from SMB to Enterprise – Prescriptive guidance and optimized methodology for deploying a data warehouse – Targeted at query workloads patterned for large sequential data reads – Balanced hardware approach HP provides – Configurations, tested performance, guidance and – Best practices for deploying/operating/managing – Packaged and custom support Basic Mainstream Mainstream Premium 8– 16TB 8 – 16TB 20 – 60TB 40– 80 TB DL38x G7w/ DL38x G7 w/ DL58x G7 w/ DL980 G7 w/ MSA2000 G3 MSA P2000 G3 MSA P2000 G3 MSA P2000 G3
    • HP SQL Server Fast Track Data Warehousing ComingFast Track G7 configurations in test soon Server: HP ProLiant DL380 G7 with Small SMP: 2x 6-core Intel Xeon 2- Socket Processor Storage : HP P2000 G3 Configuration Scalability: 8 – 16TB 2p; 12 core, 64-192GB RAM Server: HP ProLiant DL 580 G7Medium SMP: 4- with 4x 8-core Intel Xeon Socket Processor Storage : HP P2000 G3 Configuration Scalability: 20 – 40TB 4p; 32 core, 144-512GB RAM Server: HP ProLiant DL980 G7 with Large SMP: 8x 8-core Intel Xeon 8- Socket Processor Storage: HP P2000 G3 Configuration Scalability: 40 – 80TB 8p; 64 core, 2TB RAM
    • Fast Track Component Architecture SQL Server Storage Interconnect Windows Server OS Storage Processor Disk Array CPU Host Storage Adaptor Server Storage Enclosure
    • Core Evaluation Metrics These metrics are used to both validate and position Fast Track Reference Architectures – Maximum Consumption Rate – Ability of SQL Server to process data for a specific CPU and Server combination and a standard SQL query. – Benchmark Consumption Rate – Ability of SQL Server to process data for a specific CPU and Server combination and a user workload or query. – User Data Capacity – Maximum available SQL Server storage for a specific Fast Track RA assuming 2.5:1 page compression factor.
    • Scaling the IO stack Storage Processor RAID-1 RAID-1 CPU Socket CPU Socket Fiber Storage Processor RAID-1 RAID-1 RAID-1 (4 Core) (4 Core) Storage Enclosure Switch Storage Processor RAID-1 RAID-1 CPU Socket CPU Socket RAID-1 Storage Processor RAID-1 (4 Core) (4 Core) RAID-1 Storage Enclosure Storage Processor RAID-1 CPU Socket CPU Socket RAID-1 (4 Core) (4 Core) RAID-1 Storage Processor RAID-1 RAID-1 Storage Enclosure CPU Socket CPU Socket (4 Core) (4 Core) Storage Processor RAID-1 RAID-1 RAID-1 Storage Processor RAID-1 RAID-1 Storage Enclosure HBA Storage Processor RAID-1 HBA RAID-1 RAID-1 RAID-1 Storage Processor RAID-1 Storage Enclosure HBA Storage Processor RAID-1 HBA RAID-1 RAID-1 Storage Processor RAID-1 RAID-1 Storage Enclosure HBA Storage Processor RAID-1 HBA RAID-1 RAID-1 Storage Processor RAID-1 RAID-1 Storage Enclosure HBA Server HBA Storage Processor RAID-1 RAID-1 RAID-1 Storage Processor RAID-1 RAID-1 Storage Enclosure
    • User Data Capacity UDC is the data capacity required – Plan for projected growth • Based on your projections • Needs to be allocated up-front – Allocate for data management needs • Staging database requirements • Temporary objects – Allocate for TempDB • Typically 20-30% of primary data space • Tempdb is not compressed
    • Storage Layout Implications for SQL Server LUN 1 LUN 2 LUN 3 LUN16 Permanent FG Permanant_DB Permanent_1.ndf Permanent_2.ndf Permanent_3.ndf Permanent_16.ndf Stage FG Database Stage Stage_1.ndf Stage_2.ndf Stage_3.ndf Stage_16.ndf Local Drive 1 TempDB TempDB.mdf (25GB) TempDB_02.ndf (25GB) TempDB_03ndf (25GB) TempDB_16.ndf (25GB) Log LUN 1 Permanent DB Log Stage DB Log
    • Sequential Scan Components ARY01D1v01 ARY02D1v03 ARY03D1v05 ARY04D1v07 4MB 4MB 4MB 4MB DB1-1.ndf DB1-3.ndf DB1-5.ndf DB1-7.ndf ARY01D2v02 ARY02D2v04 ARY03D2v06 ARY04D2v08 4MB 4MB 4MB 4MB DB1-2.ndf DB1-4.ndf DB1-6.ndf DB1-8.ndf Contiguous allocation, data striping, pre-fetch, and read-ahead work to create efficient Sequential IO – Data stripe width is balanced against read-ahead “Depth” – Combined, these elements provide effective access to the full data stripe from a single thread Each element is necessary to maximize efficiency
    • loadingOne of the important topicsI hope you saw the session yesterdayIf not – you can watch the video ORThere is Appendix to this presentation -
    • Minimizing File fragmentation Pre-allocate database files • Size files correctly to prevent growth • Do not shrink files Do not use NTFS file fragmentation tools – Rebuild table to ensure disk block level optimal organization Writing data – Concurrent load operations to the same file will induce fragmentation – DML change operations (Update/Delete) may induce fragmentation Use Filegroups and Partitioning to manage concurrent writes for large tables
    • What’s next?My car is too small 
    • ••
    • AgendaMotivationFast Track Offering – Balanced Architecture Approach for DW – Example FastTrack Reference Architectures – Optimizing Storage, Load and MaintenanceParallel Data Warehouse Offering Overview – Scale Out Architecture Approach for DW – SQL Server in Scale Out Story
    • HP Enterprise Data Warehouse ApplianceTransforming today’s SQL BEFORE AFTER The world’s most scalable, easy-to-manage enterprise data warehousing solution
    • HP Enterprise Data Warehouse ApplianceCOMPLETE SIMPLIFIED FOR ANY SCALE
    • HP Enterprise Data Warehouse ApplianceDescription Scale-Out of SQL Server: 10s TB ►100s TB ►PB Uses massively parallel processing (MPP) Highly optimised for DW workload at each layer of the stack Uses index-Light Deliver predictable performance at low cost Simplified deployment and maintenance via appliance model Integration with existing SQL Server 2008 DW via Hub & Spoke Architecture Lower total cost of ownership
    • HP Parallel Data Warehouse Appliance -Hardware Architecture Data Rack Storage Nodes Database Nodes Control node Control Rack HP ProLiant DL HP MSA P2000 G3 Where clients apps connect Control Nodes SQL HP ProLiant DL MPP engine runs here Active / Passive Compute nodes SQL Controls DMS on all nodes Store user data Client Drivers SQL SQL Central point for all HW Perform local query processing Dual Fiber Channel monitoring Run dataSQL movement service Dual Infiniband Management Servers Not accessible to outside world SQL Management node Data Center S/W upgrades and patch SQL Monitoring deployment staging place Holds S/W images in case a Landing Zone SQL node needs reimaging Landing Zone SQL ETL Load Interface Staging place for data loading SQL Backup node Accessible to outside world Backup Node SQL Backup file storage Corporate Backup Accessible to outside world Solution Spare Database NodeCorporate Network Private Network
    • Symmetric Multi-Processing vs. MassivelyParallel Processing SMP (SQL Server, Fast Track) MPP (PDW) OLTP, Transactional, Parallel Data Warehousing Data Warehousing (esp. VLDB, complex workloads)
    • HP Enterprise Parallel Data Warehouse –Impressive live demo Massive parallel query processing 106 billion rows; 10 TB table High performance report without indexing and aggregations
    • AgendaMotivationFast Track Offering – Balanced Architecture Approach for DW – Example FastTrack Reference Architectures – Optimizing Storage, Load and MaintenanceParallel Data Warehouse Offering Overview – Scale Out Architecture Approach for DW – SQL Server in Scale Out Story
    • Data Distribution with replication Database Date Dim Customer D_DATE_SK D_DATE_ID C-CUSTOMER_SK D_DATE D_MONTH C_CUSTOMER_ID Item C_CURRENT_ADDR … … I_ITEM_SK I_ITEM_ID I_REC_START_DATE I_ITEM_DESC … SS[1] Store Sales Ss_sold_date_sk SS[2] Ss_item_sk Ss_customer_sk Ss_cdemo_sk SS[3] Ss_store_sk Ss_promo_sk Ss_quantity Promotion SS[4] Customer … Demographics P_PROMO_SK P_PROMO_ID CD_DEMO_SK P_START_DATE_SK P_END_DATE_SK CD_GENDER Store … CD_MARITAL_STATUS CD_EDUCATION … S_STORE_SK S_STORE_ID S_REC_START_DATE S_REC_END_DATE S_STORE_NAME …
    • Distributed Data Warehouse Architecture Departmental Reporting MS Office 2010 Regional Reporting Enterprise data Central Enterprise can be maintained DW Hub on a PDW hubHub= unified EDW ETL ToolsSpoke= Federated data marts
    • Distributed Data Warehouse ApproachHub & Spoke model Enables DW architecture to more closely match the structure of large enterprises. Separates user and data workloads eliminating traditional process and resource conflicts Integrate both SMP and MPP systems with “Shared Nothing” All systems connect via a dedicated high speed netwok Dual high speed Infiniband Supports multiple workloads on different systems
    • Microsoft Data Warehousing – Product Offering PDW with Scale Hub-and-spoke 1 Minimal HW tune Complexity 4 up/optimization. Supports HA by default mixed workloads SW-HW integration 3 2 Balanced solution for mostly scan centric workloads. PDW 3 Max HW tune up for most DW scenarios. SQL Server 2008 R2 4 Most flexible Architecture for with Fast Track handling all DW scenarios. Reference Architecture 2 SQL Server 2008 R2 1
    • Resources SQL Server Fast Track DW Home Page – http://www.microsoft.com/sqlserver/2008/en/us/fasttrack.aspx Fast Track DW 2.0 Architecture Whitepaper – http://msdn.microsoft.com/en-us/library/dd459178.aspx Use minimal logged BULK operation (Trace Flag –T 610) – http://msdn.microsoft.com/en-us/library/dd425070.aspx
    • Perspectives: 2010
    • ‫משובים ופייסבוק‬ ‫מירב- השלמה‬
    • ‫!‪Let’s Party‬‬ ‫ארוחת ערב – בין השעות 03:02-03:81‬ ‫תחבורה למסיבה – שאטלים החל מ- 03:02‬ ‫צמידים לכניסה - מקבלים במעטפות בקבלת החדרים‬
    • Alternatives for loadingUse a heap – Practical if queries need to scan whole partitionsor…Use a batchsize = 0 – Fine if no parallelism is needed during loador…Use a Two-Step Load 1. Load to a Staging Table (heap) with constraint for Deltas 2. INSERT-SELECT from Staging Table into Target CI Resulting rows are not fragmented Can use Parallelism in step 1 – essential for large data volumes
    • Two-Step Load Variations To achieve high parallelism during historical load – Typically into a partitioned table – Use a Staging Table (heap) that is partitioned identically to the Target Table – Use multiple concurrent streams to load the Staging Table with moderate batchsize (SSIS, Bulk Insert, etc) – INSERT-SELECT separate partitions into the Target Table – potentially in parallel • Use ALTER TABLE SET ( LOCK_ESCALATION = AUTO) – Note: If memory is limited, TempDB could be heavily used for sorting
    • Two-Step Load Variations (cont.)To avoid most TempDB space and TempDB IO during load – Use a partitioned Staging Table that is also indexed identically to Target Table – Load Staging Table using moderate batchsize (< 1M rows) – Final INSERT-SELECTs will avoid any sort! • However the staging loads will be logged – Note: Parallelism will be limited if load batches overlap
    • Loading DataGoal: maximize read performance – Minimizes Disk head movement – Maintains high average request size (Think ~400k not 8k) – Sustain high average scan ratesKey considerations for a Fast Track data load – Data Architecture: Destination table, partitioning, and filegroup – Source Data: Format & size – System Resources: CPU & MemoryUse minimal logged BULK operation (Trace Flag –T 610) – http://msdn.microsoft.com/en-us/library/dd425070.aspx