Learn how SQL Server can scale to HUNDREDS of terabytes for BI solutions. This session will focus on Fast Track Solutions and Appliances, Reference Architectures, and Parallel Data Warehousing (PDW). Included will be performance numbers and lessons learned on a PDW implementation and how a successful BI solution was built on top of it using SSAS.
Learn how SQL Server 2008 can scale to HUNDREDS of terabytes for BI/DW solutions. This session will focus on Fast Track Solutions and Appliances, Reference Architectures, and Parallel Data Warehousing (PDW). Included will be performance numbers and lessons learned one of the very first production PDW deployment in the world and how a successful BI solution was built on top of it using SSAS. Learn about all the different appliances and how they can save you a tremendous amount of time and money instead of building on your own: HP Business Decision Appliance (BDA), HP Business Data Warehouse appliance (BDW), HP Enterprise Data Warehouse Appliance (EDW), and HP Database Consolidation Appliance (DBC). If you are involved in the decision making in your company for purchasing one or more servers to be used for SQL Server, this session will make you aware there are options outside of the usual ordering a server and internally installing the hardware, OS, and SQL Server.
Ask how many have a data warehouse, how many heard of and know a little about each appliance
Oracle JD Edwards E1…DBMoto, Rapid Decisions
CPU % stays low
Reference configurations include parts list (servers, networks, etc) and instructions on how to setup and implement. We can build those out for you, or the customer can build themselves. Essentially with the reference config, we’ve done all the work to select the best parts, included HP and MSFT best practices, then tested and verified for optimal performance, with our teams of HP and MSFT engineers.
Appliances solve the bottleneck issues
This is a reference configuration optimized for data warehousing. It scales from 6 to 80 terabytes. You can choose industry-standard hardware from Dell, HP, Bull, IBM, EMC, and other leading vendors (over 10). This balanced hardware approach is ideal for small to mid-sized DW or data-marts with scan-centric workloads. It’s a system you will put together after receiving all the hardware (you need to install the OS, the edition of SQL Server that you’ve purchased, and any other products such as SharePoint and PowerPivot for SharePoint). Note that these systems can be scaled-out by integrating them into an Enterprise Data Warehouse Appliance. Fast Track Data Warehouse starts at under $11,000 per terabyte. Fast Track Data Warehouse hardware is tested for Data Warehousing which eliminates guesswork and is designed to save you months of configuration, setup, testing and tuning. Version 3.0 of the reference architecture was announced February 2011 (see the Fast Track Data Warehouse 3.0 Reference Guide).
Fast Track Data Warehouse is a reference configuration optimized for data warehousing. This saves an organization from having to commit resources to configure and build the server hardware. Fast Track Data Warehouse hardware is tested for data warehousing which eliminates guesswork and is designed to save you months of configuration, setup, testing and tuning. You just need to install the OS and SQL Server. There are currently ten vendors who offer servers based on the Fast Track Data Warehouse architectureto maximize CPU IO consumption
Three approaches to using the FTDW methodologyFull evaluation: Identify query that is representative of workload, calculate the Benchmark Consumption Rate (BCR) for query and Required User Data Capacity (UDC) and compare both to Maximum CPU Consumption Rate (MCR) and Capacity ratings for conforming Fast Track reference architectures.
Three approaches to using the FTDW methodologyFull evaluation: Identify query that is representative of workload, calculate the Benchmark Consumption Rate (BCR) for query and Required User Data Capacity (UDC) and compare both to Maximum CPU Consumption Rate (MCR) and Capacity ratings for conforming Fast Track reference architectures.MCR – like MPG rating for car, BCR – like actual MPG
HP: A one rack system has 17 servers, 22 processors/132 cores, and 125TB and can be scaled out to a four rack system with 47 servers, 82 processors/492 cores, and 500TBHP Business Decision Appliance (BDA): Made specifically for BI. HP and Microsoft have delivered the first ever self-service business intelligence appliance, optimized for SQL Server 2008 R2 and SharePoint Server 2010. Ideal for managed self-service BI with PowerPivot. Developed for mid-market, enterprise department and remote offices. The server has 2 CPU’s (12 cores) and 96GB memory. Configuration of the appliance is integrated into SharePoint. The Windows Server OS, plus all of the required server components, such as SQL Server and SharePoint, are already loaded on the appliance. There’s no need to perform any software installations
The Microsoft and HP Business Data Warehouse appliance optimized for SQL Server 2008 R2, provides the best performance possible for data mart and small data warehouse workloads with data volumes of up to 5 TB. The appliance helps reduce the time and cost of implementing a data warehouse through deployment of an optimized, pre-configured solution that has been designed, tuned, tested by Microsoft and HP. Targeted for lower end. The Windows Server OS and SQL Server are already installed for you.
Quick Deployment Wizard:Name the applianceJoin a domainSpecify appliance administratorsDDL and Data Load WizardCreate staging and production databasesLoad dataCheck data fragmentationThe NEW Dell Quickstart Data Warehouse Appliance will be the first data warehouse appliance in the market to run on SQL Server 2012. It is geared at mid-market and departmental use enabling easy data access, analysis, and Business Intelligence to make better decisions, and the integration needed out of the factory to deliver results faster. This new data warehousing appliance, with Microsoft SQL Server 2012, takes advantage of the Dell Boomi acquisition for data integration and goes to beta test on February 27, 2012
Made specifically for BI. HP and Microsoft have delivered the first ever self-service business intelligence appliance, optimized for SQL Server 2008 R2 and SharePoint Server 2010. Ideal for managed self-service BI with PowerPivot. Developed for mid-market, enterprise department and remote offices. The server has 2 CPU’s (12 cores) and 96GB memory. Configuration of the appliance is integrated into SharePoint. The Windows Server OS, plus all of the required server components, such as SQL Server and SharePoint, are already loaded on the appliance. There’s no need to perform any software installations.
Not specifically for a DW, not a DW appliance, so not Fast Track 3.0 compliant. Essentially a SharePoint server designed to store PowerPivot reports in SharePoint Mode ala (http://msdn.microsoft.com/en-us/library/ee210682.aspx).
Machine Name and Domain Join
SharePoint Starting Page – For End User
Many customers continue to struggle with solving the problem of SQL server sprawl. With a proliferation of databases out there, the complexity and cost of managing a virtualized infrastructure obstructs applications from accessing enterprise-wide data and leads to long provisioning times.The Database Consolidation Appliance is optimized for SQL Server 2008 R2 and designed to deal with just that problem. The solution reduces complexity and mounting costs associated with the proliferation of SQL server data bases, while enabling applications to access enterprise-wide data quickly.It consolidates hundreds of transactional databases into a single, virtual environment, eliminating server sprawl and simplifying management of virtualized infrastructures. Once installed, new SQL Server databases can be provisioned in minutes with predictable performance and migrations can be accomplished with near-zero downtime.The appliance allows customers to create a private cloud with self-service, on-demand scalability, and dynamic elasticity. The appliance is an HP server running Windows Server 2008 R2, SQL Server 2008 R2, and the Hyper-V Cloud stack.It is based on HP ProLiant BL465 servers and StorageWorks P2000 storage with optimized SQL performance and availability for virtualized tier 2 and 3 transactional data base workloads. It rapidly scales up to hundreds of databases per solution. It will be delivered as a complete, pre-installed, pre-tuned hardware with rapid deployment and enhanced manageability. This system is basically a private cloud appliance which is a rack of servers with 192 cores, 400 disk drives (supporting up to 60k SQL Server IOs) and 2 TB of RAM. This can be purchased as a half rack or a full rack appliance.Similar capabilities of the appliance will be offered as a reference architecture to customers who have advanced IT skills and prefer to build their own appliance with best practice guidance from HP and Microsoft. HP and Microsoft will offer support and consulting services for the converged application appliances, including assessment, design, proof of concept, implementation and ongoing support.
Not specifically for a DW, not a DW appliance, so not Fast Track 3.0 compliant. essentially a big Hyper-V appliance where you can use System Center to manage your Physical to Virtual migration of your SQL Server Instances. The extra benefits come from an integrated SCOM configuration that allows you to monitor all of your Virtual Machines. In a half-rack configuration, the DBC Appliance lists for approximately $664,000 ($380,000 for the hardware, plus $284,000 for all of the software). The full-rack configuration lists for $1,230,000 ($660,000 for the hardware, plus $568,000 for the software). These costs don’t include the SQL Server Enterprise Edition licensing costs for the VM guests. However, in many cases, the organization’s current SQL Server Enterprise Edition or Software Assurance licenses will cover the SQL Server licensing requirements.
Proven infrastructure built on best practices of leading expertsSecure, compliant infrastructure for fewer vulnerabilitiesSingle, trusted source of support for simpler resolutionTight integration across applications and infrastructureEnd-to-end, integrated management and virtualization consolesStandards-based technology adapts easily for growthOptimal mix of tuned hardware and softwareHigh-performance, cost-effective infrastructureComponents jointly tested and optimized by HP & Microsoft Validated in customer environments
DatAllegro, was on Ingres on Linux. MS purchased July 2008, took 2.5 years to convert to Windows and SQL ServerRTM on 10/10/10Microsoft’s answer to: Oracle Exadata, Teradata,Netezza, GreenPlum
Massively Parallel Processing (MPP):Ability to leverage multiple concurrent resources to resolve SQL set operations against Distributed data.Each instance works in parallel on its own “distribution” of a single user query.PDW supports up to 10 parallel instances of SQL Server DBMS per Data Rack.Max of 40 Nodes
Key Point: This server CPUs can consume 16 GB/Sec of IO, but the SAN can only deliver 2 GB/SecInternal software to run all this is Hadoop-like (text searching or data mining vs relational database capabilities): http://quantlabs.net/labs/code/kdb-database/996-hadoopmapreduce-vs-sql-mpp-databasesappliances
Shared Nothing Computing Resource and data independence are maintained within each DBMS instanceEach instance reserves shared resources (CPU, Memory, Disk) for only its distribution of system dataSimply add new resources to continually scale outUltra Shared NothingAn extension of traditional shared nothing designPush shared nothing architecture into SMP nodeIO and CPU affinity within SMP nodesEliminate contention per user queryUse full resources for each user queryPredictable resultsMultiple physical instances of tablesDistribute large tablesReplicate smaller tablesRedistribute rows “on-the-fly” when necessary
Mine: 10 billion rows, 1 second
Partition 2 years by day = 2,568,49340 servers * 8 tables =320 tablesThis horizontal partitioning breaks the table up into 8 partitions per compute node. Each of these distributions (essentially a table in and of itself) have dedicated CPU and disk that is the essence of Massively Parallel Processing in PDW. There are 8 internal disks per compute node.
Full SQL Server functionalityDistributes the workloadAllows existing/new data marts to be fully and easily integrated into the EDWBetter solution for customers than consolidation‘Best of both worlds’ solutionEnables publishingExpand and add spokes without impacting other usersSpokes can be budgeted
Compitition: Oracle Exadata, Teradata,Netezza, GreenPlumMissing Piece: Customers using SQL Server got DW too big, were forced to switch to another vendor. MS is like when SQL Server first came out: beat on price, not as many featuresSSIS data load throughput of up to 285 GB/hour
Fast Track for SQL Server 2012, a reference architecture developed jointly by Dell and Microsoft, combine Dell’s PowerEdge 12th generation servers and Fluid Data Architecture with the benefits of SQL Server software. Delivered as a reference architecture, customers receive step-by-step best practices, eliminating any guesswork on how to balance and configure all the components of the hardware and software. This reduces the risk, cost, and time to successful implementation and gives customers one of the lowest costs for a complete data warehouse solution in the market. This offering includes both data warehousing and major components of an integrated Business Intelligence and Enterprise Information Management platformScalable Analytics & Data Warehousing: Fast Track for SQL Server 2012Fast Track for SQL Server 2012 has significant improvements, leveraging exciting new enhancements in SQL Server 2012 like the new ColumnStore Index feature. You will also see how Fast Track partners are working with the SQL Server team to transform the latest enhancments in server and storage technology into highly optimized data warehouse Reference Archtiectures.
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of Terabytes
Overview ofMicrosoft AppliancesJames Serra, Business Intelligence ConsultantJamesSerra3@gmail.comJamesSerra.com April 10-12 | Chicago, IL
Please silencecell phones April 10-12 | Chicago, IL
About Me• Business Intelligence Consultant, in IT for 28 years• Owner of Serra Consulting Services, specializing in end-to-end Business Intelligence and Data Warehouse solutions using the Microsoft BI stack• Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM architect, PDW developer• Been perm, contractor, consultant, business owner• MCSE for SQL Server 2012: Data Platform and BI• SME for SQL Server 2012 certs• Contributing writer for SQL Server Pro magazine• Blog at JamesSerra.com 3
Agenda• Why use a Data Warehouse?• Fast Track Data Warehouse (FTDW)• Business Data Warehouse Appliance (BDW)• Business Decision Appliance (BDA)• Database Consolidation Appliance (DBC)• Parallel Data Warehouse (PDW) 4
Why use a Data Warehouse?All these reasons are for data warehouses only (not OLTP):• Reduce stress on production system• Optimized for read access, sequential disk scans• Integrate many sources of data• Keep historical records (no need to save hardcopy reports)• Restructure/rename tables and fields• Use Master Data Management, including hierarchies• No IT involvement needed for users to create reports• Improve data quality and plugs holes in source systems• One version of the truth• Easy to create BI solutions on top of it 5
Why use a Data Warehouse?Legacy applications + databases = chaos Enterprise data warehouse = order Production Finance Continuity Control Consolidation MRP Marketing Control Compliance Inventory Control Sales Collaboration Parts AccountingManagement Single version Logistics Management of the truth Reporting Shipping Enterprise Data Engineering Warehouse Raw Goods ActuarialOrder Control Human Resources Purchasing Every question = decision 6
Some SQL Data Warehouses TodayWhat‟s wrong with this picture??? Get a big SAN… Connect it to the biggest server you can get your hands on Hope for the best! 7
System out of balance!!!• This server CPUs can consume 16 GB/Sec of IO, but the SAN can only deliver 2 GB/Sec • Even when the SAN is dedicated to the SQL Data Warehouse, which it often isn‟t • Lots of disks for Random IOPS BUT • Limited controllers & Limited IO bandwidth• System is typically IO bound and queries are slow • Despite significant investment in both Server and Storage • Result: Disappointed DBA turning to tuning to squeeze out a bit more performance 8
Potential Performance Bottlenecks DISK DISK SQL SERVER CPU CORES A FC SWITCH FC SERVER WINDOWS A CACHE HBA B LUN CACHE A STORAGE A B CONTROLLER B DISK DISK FC A HBA B B LUNCPU Feed Rate SQL Server HBA Port Rate Switch Port Rate SP Port Rate LUN Read Rate Disk Feed Rate Read Ahead Rate 9
SolutionFast Track Data Warehouse - A reference configuration optimized fordata warehousing. This saves an organization from having to commitresources to configure and build the server hardware. Fast Track DataWarehouse hardware is tested for data warehousing which eliminatesguesswork and is designed to save you months ofconfiguration, setup, testing and tuning. You just need to install the OSand SQL ServerAppliances - Microsoft has made available SQL Server appliances thatallow customers to deploy data warehouse (DW), business intelligence(BI) and database consolidation solutions in a very short time, with allthe components pre-configured and pre-optimized. These appliancesinclude all the hardware, software and services for a complete, ready-to-run, out-of-the-box, high performance, energy-efficient solutions 10
What are Microsoft „Appliances‟Problem: Large percentage of IT projects not successful. Too long/complex to install/deploy/configure/tune. Need toomany experts.Appliances = HW + SW + Services• Hardware from vendor (buy from HP, Dell, etc)• Software from Microsoft: SQL/SharePoint VL (buy from Microsoft)• Services from vendor for entire solution• Optimized for the HW+SW: e.g. 3000+ SW parameters, and 500+ HW parts chosen for larger appliancesMarketing taxonomy (offer customer choice): Reference Guidance Architectures, “Fast Appliances Track” brand • Build it yourself • “Cooking recipe” • Very fast time to value • Custom configurations • Probably higher success • No options (besides „size‟) • High IT expertise • Can be „sold‟ to customers • Tied to HW vendor 11
Microsoft 3 Data Warehouse offerings • DW Appliance SQL Server 2012 • • DW Only MPP – Massive Parallel Parallel Data Warehouse Processing • Scales to 6 PB • SW and HW Reference SQL Server 2012 • • DW Only SMP – runs on 1 Server Fast Track Data Warehouse • Scales > 40TB (in best of conditions) • Customer defines HW • DW or OLTP SQL Server 2012 • SMP – runs on 1 Server • Scales depending on HW, best for < 2 TB DW 12
Fast Track Data Warehouse FT Version 4.0 Benefits: - Pre-Balanced Architectures - Choice of HW platforms - Lower TCO - High Scale - Reduced Risk 15
Fast Track Data Warehouse• Reference architecture• Balanced hardware and database configuration• Storage, server, application settings, configuration settings• Predictable performance, scale from 3 to 80 TB• Data warehouse workload-centric (not one-size-fits-all)• Efficient disk scan (rather than seek) access• Benchmarking procedures• You put together after receiving all the hardware (you need to install the OS, the edition of SQL Server that you‟ve purchased, and any other products such as SharePoint and PowerPivot for SharePoint)• Eliminates guesswork and is designed to save you months of configuration, setup, testing and tuning 16
Fast Track Data Warehouse• SQL Server Best Practices • Data Architecture: Heap Tables, Clustered Index Tables, Table Partitioning • Indexing • Database Statistics • Compression • Managing Data Fragmentation • Loading Data methods 17
Fast Track Data Warehouse Option Pros Cons1. Basic Evaluation Very fast system set-up and procurement Possibility of over-specified storage or (days to weeks) under-specified CPU Minimize cost of design and evaluation Lower infrastructure skill requirements2. Full Evaluation Predefined reference architecture tailored Evaluation takes effort and time (weeks to to expected workload months) Potential for cost-saving on hardware Requires detailed understanding of target Increased confidence in solution workload3. User-defined Potential to reuse existing hardware Process takes several monthsReference Architecture Potential to incorporate latest hardware Requires significant infrastructure System highly tailored for your use-case expertise Requires significant SQL Server expertise 18
Fast Track Data Warehouse• These metrics are used to both validate and position Fast Track RA‟s • Maximum Consumption Rate (MCR) – Ability of SQL Server to process data for a specific CPU and Server combination and a standard SQL query • Benchmark Consumption Rate (BCR) – Ability of SQL Server to process data for a specific CPU and Server combination and a user workload or query • User Data Capacity (UDC) – Maximum available SQL Server storage for a specific Fast Track RA assuming 2.5:1 page compression factor and 300 GB 15K SAS. 30% of this storage should be reserved for DBA operations 19
Appliances• HP Business Data Warehouse Appliance (FT 3.0, 5TB)• HP Business Decision Appliance (BI, SharePoint 2010, SQL Server 2008R2, PP)• HP Database Consolidation Appliance (virtual environment, Windows2008R2)• HP Enterprise Data Warehouse Appliance (1st PDW, SQL2008R2, 610TB)• Dell Quickstart Data Warehouse Appliance 1000 (FT 4.0, 5TB)• Dell Quickstart Data Warehouse Appliance 2000 (FT 4.0, 12TB)• Dell Parallel Data Warehouse Appliance (2nd PDW, SQL2008R2, 600TB)• IBM Fast Track Data Warehouse (FT 4.0, 3 versions: 24TB, 60TB, 112TB)V2 of PDW by HP released, uses SQL Server 2012, quarter-rack (75TB) 21
Business Data Warehouse Appliance• HP and Microsoft tuned and tested (Dell, SQL Server 2012)• Optimized for SQL Server 2008 R2• Data Warehouse up to 5TB• Fast Track 3.0 compliant• Windows Server 2008 R2 Enterprise and SQL Server 2008 R2 Enterprise already installed and configured• Pre-tuned, pre-configured, pre-installed. Turn on and go!• Single point of contact for support• Quick Deployment Wizard and DDL & Data Loading Wizard• Could be spoke in PDW hub and spoke architecture• 2 CPU‟s (12 cores), 96GB memory, 2TB storage• Dell Quickstart Data Warehouse Appliance 1000/2000 (SQL Server 2012) 24
Business Decision Appliance• HP and Microsoft tuned and tested• Made specifically for BI• Optimized for SQL Server 2008 R2 and SharePoint 2010• Windows Server 2008 R2 Enterprise, SQL Server 2008 R2 Enterprise, SharePoint 2010, PowerPivot already installed and configured• Pre-tuned, pre-configured, pre-installed. Turn on and go!• Single point of contact for support• Quick Deployment Wizard• 2 CPU‟s (12 cores), 96GB memory 26
Database Consolidation Appliance• HP and Microsoft tuned and tested big Hyper-V environment• Solves problem of SQL Server sprawl• Virtual environment, private cloud, on-demand scalability• New SQL Server databases provisioned in minutes• Pre-installed Windows Server Datacenter 2008 R2, SQL Server 2008 R2 Enterprise, Hyper-V, System Center Suite• Microsoft Database Consolidation 2012 software to manage the appliance• Automatic load balancing, high availability• Pre-tuned, pre-configured, pre-installed. Turn on and go!• 192 cores, 400 disk drives, 2TB memory as a reference architecture• Offered as a reference architecture 30
Database Consolidation ApplianceDesign, Build & Deploy in weeks rather than months Custom-built solution Integrated & Optimized Appliance Assess and understand workload Define architecture 1 Design Design Choose appliance for workload Evaluate alternatives 1 Design specific implementation Build Acquire appliance Acquire HW & SW components 2 Install appliance Build solution 2 Build Load data Deploy Extract & load data Weeks Proof Of Concept & Validation 3 Stand-up in production Months Tune & Balance HW & SW 3 Integrate in environment Monitor & Manage Deploy Burn in & Stability 4 Extract and manipulate data Monitor and troubleshoot Use Extract and manipulate data Generate reports 4 Use Generate reports 5 Make decisions Make decisions 31
Parallel Data Warehouse (PDW) Scale Out for both Performance and Capacity simultaneously by adding racks A prepackaged or pre-configured balanced set of hardware (servers, memory, storage and I/O channels), software (operating system, DBMS and management software), service and support, sold as a unit with built-in redundancy for high availability positioned as a platform for data Control Rack warehousing. 10 Node Data Rack HP PDW 4 Rack: HP PDW V2: HP PDW 1 Rack 47 Servers ¼ rack to 7 racks 17 Servers 82 Processors / 492 Cores Up to 56 nodes, 896 Cores 22 Processors /132 Cores 500 TB 15 TB – 6 PB 125 TB 32
SMP vs MPP SMP MPP with PDW• HW advancements increasing • HW advancements increasing ability to scale-up ability to scale-up & scale-out • Scaling is limited • Scaling to 6 PB+ • High end SMP very expensive • Scale out is relatively low cost• Extremely high concurrency for • Relatively high concurrency for some workloads complex workloads• Less than 1-2 TB of data SMP • > 15 TB (typically) up to 6 PB will almost always be better. • Limited SQL Server functionality Usually <10TB • HA is built in• Full SQL Server functionality• HA must be architected in 33
PDW Benefits – Key components all in one package Failover ClustersControl and Dual networksManagement Node (1) Mirrored drivesSingle connection point Hot swap drivefor SQL queries. Single Dual power suppliestouch point for DBAs. Dual cooling fansPatch management.Active Directory. Storage Node (8)Failover Zone (1) 35 Disks each.Server and storage Dual network cards.dedicated to loadingdata. DAS (Direct-Attached Storage) via SAS JBOD. Compute Node (8) A SQL Server 2012 Instance. Highly Tuned SMP.Customer Space (8U) 8 Cores each.ETL Servers, Backup 8 Disks each (TempDB).Servers 34
PDW Benefits – Massive Parallel Processing Query 1 ? Query 1 is standard T- ? SQL submitted to SQL Server on Control ? Node ? ? Query is executed on ? all 10 Nodes ? ? ? Results are sent back ? to client 35
PDW Benefits – Massive Parallel Processing? ? ??? ???? L ???????? Multiple queries are simultaneously ? L ?? ?????? executed across all? ? L ???????? nodes. L ????????? ? L ???????? ???????? PDW supports querying while data L ???????? is loading. L? ? Load L L ???????? ???????? L L ???????? 36
PDW - Data Layout Options Replicated Distributed Ultra shared nothing• A table structure that • A table structure that is • The ability to design a exists as A full copy hashed on a single schema of both within each discrete column and uniformly distributed and DBMS instance. distributed across all replicated tables to nodes on the appliance. minimize data Each distribution is A movement separate physical table • Small sets of data can be in the DBMS. more efficiently stored in full. • Certain set operations are more efficient against full sets of data. 37
PDW – PASS Conference Demo• Using TPC-H Data Model for Retail Store Analytics • PDW Database Size – 100+TB DW • Largest Table - Line_Item_Detail = 600B rows • Remaining Fact and Dimension Tables = 220B rows• PDW Infrastructure – 4 Data Racks • Query 1 ran < 20 seconds 38
PDW – Demo Query Syntax SELECT n_name, r_name , SUM(o_totalprice) AS totalprice, SUM (l_quantity) AS totalqty FROM nation , region , customer, orders , lineitem WHERE r_regionkey = n_regionkey AND n_nationkey = c_nationkey AND c_custkey = o_custkey AND o_orderkey =l_orderkey AND l_shipdate BETWEEN 1997-12-01 AND 1997-12-07 AND o_orderdate BETWEEN 1997-12-01 AND 1997-12-07 GROUP BY n_name , r_name , o_orderstatus HAVING COUNT(l_partkey) > 4 39
PDW – Balanced across servers and withinLargest Table 600,000,000,000Randomly distributed across 40 SQL servers 15,000,000,000In each server randomly distributed to 8 tables 1,875,000,000Each partition – 2 years data partitioned by week 17,979,452As an end user or DBA you think about 1 table: LineItem.You run “select * from LineItem”PDW is an appliance, simple to use!You don‟t care or need to know that there are actually 320 tables representing your 1 logical table.That each of those 320 is using it own clustered index and has range partitioning. 40
PDW – Hub and spoke architecture Departmental Reporting SQL Server High-Performance Regional Central EDW Hub Reporting Reporting SQL Server SQL Server Analysis Services FastTrack Landing Zone ETL Tools 41
Parallel Data Warehouse• Scale-out instead of scale-up• MPP instead of SMP• Ultra shared nothing architecture• Infiniband• Hub-and-spoke architecture with support for SMP spokes• Hardware redundancy, failover clustering• Parallel loading – 1.5TB per hour on 1 rack• High speed scanning – 20 to 35GBps per rack• All appliances can be part of this architecture• SSIS data flow destination component, .net driver• DWLoader.exe• HP (EDW) and Dell• Fills “Missing Piece” for Microsoft 42
PDW v2 Features• xVelocity with Columnstore Index (10-50x faster, updatable)• Windows 2012 storage spaces• SQL Server 2012, Windows Server 2012• Everything is virtualized with Hyper-V• Hyper-V for failover, replacing HPC• DAS via SAS JBOD, instead of SAN• Polybase: Hadoop connector• Upgraded hardware• Direct Query (ROLAP) with Power View and Tabular Model (no cube processing!) 43
Final thoughts and questions• Fast Track for SQL Server 2012 http://bit.ly/10rGNFV• Microsoft private cloud fast track reference architecture: http://bit.ly/10JMiA1• OLTP reference architecture (HP Enterprise Transaction Processing Reference Architecture) http://bit.ly/10JN9AU• OLTP reference appliances (Built on HP ProLiant DL980): http://bit.ly/yWEp9CJames Serra, Business Intelligence ConsultantJamesSerra3@gmail.comJamesSerra.com 44
Resources• Microsoft SQL Server Parallel Data Warehouse (PDW) Explained: http://bit.ly/yyuElC• Microsoft SQL Server Reference Architecture and Appliances: http://bit.ly/y7bXY5• Microsoft‟s Data Warehouse offerings: http://bit.ly/xAZy9h• Microsoft and HP‟s Database Consolidation Appliance: http://bit.ly/yBz2qj• Parallel Data Warehouse (PDW) Version 2: http://bit.ly/10rGJWB• Fast Track Data Warehouse 4.0 Reference Guide: http://bit.ly/10rGNFV• 7 SQL Server Fast Track Data Warehouse FAQs http://bit.ly/ykS1PD• HP Fast Track Solutions for Microsoft SQL Server http://bit.ly/wzSsdd• IBM Reference Configurations for Microsoft SQL Server Fast Track Data Warehouse 4.0 http://ibm.co/zmYihU• Dell SQL Server 2012 Fast Track Data Warehouse http://dell.to/10rH23B• Bull Fast Track http://bit.ly/10rHxKV• Infrastructure Planning and Design Guides for SQL Server: http://bit.ly/AvZQ79 45
Win a Microsoft Surface Pro!Complete an online SESSION EVALUATIONto be entered into the draw.Draw closes April 12, 11:59pm CTWinners will be announced on the PASS BAConference website and on Twitter.Go to passbaconference.com/evals or follow the QR code link displayed onsession signage throughout the conference venue.Your feedback is important and valuable. All feedback will be used to improveand select sessions for future events.
Platinum Sponsor Thank you!Diamond Sponsor April 10-12, Chicago, IL