Presentation by Bruce Campbell of Microsoft
Learn about a new capability in SQL Server 2008 R2, Parallel Data Warehouse, formerly known as Project Madison.
Automating Google Workspace (GWS) & more with Apps Script
SQL Server 2008 R2 Parallel Data Warehouse
1. SQL Server and Data Warehousing
SQL Server 2008 R2 Parallel Data Warehouse Appliance
Speaker: Phil Hummel of WinWire Technologies
Presentation developed by: Bruce Campbell
Western Region Data Warehouse Specialist, Microsoft
Silicon Valley SQL Server User Group
February 16, 2009
Mark Ginnebaugh, User Group Leader,
mark@designmind.com
2. Agenda
• SLQ 2008 R2 Parallel DW Appliance
– Hardware and Software Architecture
– Case Study
– Customer Experience Opportunities
• Next Steps
3. SQL Server Parallel Data Warehouse
Formerly Project Madison
Project
Madison Madison MPP Layer
INDUSTRY STANDARD
SERVERS
Reference
Hardware
Platforms INDUSTRY STANDARD
NETWORKING
INDUSTRY STANDARD
STORAGE
4. Parallel DW Appliance Experience
• All hardware from a single vendor
• Multiple vendors to chose from
• Orderable at the rack or cluster
• Vendor will
– Assemble appliances
– Image appliances with OS, SQL Server and Madison
software
• Appliance installed in less than a day
• Support –
– Vendor provides hardware support
– Microsoft provides software support
6. Parallel DW - MPP Example
Database Servers
Query Rewritten Into Steps
That Run Efficiently On
Database Servers
ODBC/JDBC
SQL92 with
Analytical
Extensions
Dual Fiber Channel
Dual Infiniband
SELECT location, year
sum(b.sales_amt)
FROM customer a, sales b
WHERE b.sales > 500 and
a.custid = b.custid
GROUP BY location, year
ORDER BY 1,2
7. Database Servers
• A SQL Server 2008 instance
• SQL as primary interface
• Each MPP node is a highly tuned SMP node
with standard interfaces
• DB engine nodes autonomous on local data
Database Server
SQL
8. Ultra Shared Nothing
• An extension of traditional shared nothing design
– Push shared nothing architecture into SMP node
• IO and CPU affinity within SMP nodes
– Eliminate contention per user query
– Use full PDW Node resources for each user query
– Multiple physical instances of tables
• Distribute large tables
• Replicate small tables
– Re-Distribute rows “on-the-fly” when necessary
9. Control Node & Client Drivers
• Client connections always go through the control node
– Clustered to a passive node to support High Availability
• Processes SQL requests
• Prepares execution plan
• Orchestrates distributed execution
• Local SQL Server to do final query plan processing / result
aggregation
• Drivers
• ODBC
• OLE-DB
• Ado.Net client drivers
10. Landing Zone
• Provides high capacity storage for data files from ETL
processes
• Supports division of workload dedicated to ETL
processes
• SSIS available on the landing zone
• Connected to PDW internal network
• Available as sandbox for other applications and scripts
that run on internal network.
Landing Data Compute
Source Loader
Zone Files Nodes
11. Backup Node
• Builds on SQL Server native backup/restore
facility
• Executes at Infiniband network speeds
• Database-level backup
• Subsequent Back Ups are Optimized
• Coordinated backup across the nodes
• Quiesce write activity to synchronize
12. Software Architecture
Other 3rd Nexus
MS BI
Party Query Database Server
Compute Nodes
(AS, RS)
Tools Tool
DMS
Control Node IIS
Admin Console JDBC User Data
OLE-DB SQL Server
ODBC
Ado.Net
PDW Services
Landing Zone
DMS Loader
DMS SQL SSIS
Core Engine DMS Client
DSQL
SQL OS Services Manager
SQL OS Backup Node
DMS
DW DW DW
DW Schema
Authentication Configuration Queue Management Node
SQL Server
HPC AD
Existing MS software Built by DWPU 3rd Party
15. SQL Server Parallel DW Architecture - HP
Database Servers
Control Nodes
SQL
Active / Passive
SQL
Client Drivers SQL
SQL
SQL
Dual Fiber Channel
SQL
Dual Infiniband
Data Center
Monitoring SQL
SQL
SQL
ETL Load Interface
SQL
SQL
Corporate Backup
Solution Spare Database Server MPP Architecture
HA Built In
Corporate Network Private Network Linear Scalability
16. Hub and Spoke – Flexible Business Alignment
Parallel database copy Support user groups with
technology enables rapid very different SLAs; hot,
data integration and warm and cold data;
consistency between hub different requirements on
and spokes data loading, etc.
Create SQL Server Parallel Data Warehouse, SQL Server 2008, Fast Track Data Warehouse,
and SQL Server Analysis Services spokes
A Hub and Spoke solution gives you the flexibility to add/change diverse workloads/user groups,
while maintaining data consistency across the enterprise 16
17. Parallel DW and Fast Track Hub and Spoke
Departmental
Reporting
Regional Reporting High Performance HQ
Reporting
Central EDW Hub
ETLTools
17
18. Microsoft Released first Technology Preview for
Parallel Data Warehouse
• First Technology Preview released on August 14
• DATAllegro’s MPP engine is now ported to SQL Server 2008 and
Windows Server 2008
• 10 customers from 7 industries signed up
– First Premier BankCard was the first customer to enlist on
Madison
– Internally – ICE, MSIT, ADCenter, XBOX
• Appliances with 8 to 20 nodes now ready to host customers test
drives
Early Results
• Data Loading rates of 1 TB per hour
• Query executions at over 1.5 TB per minute
• Madison running 5 times faster than DATAllegro with Ingres DBMS
before acquisition!
Launch of Parallel Data Warehouse:
• Next Technology Preview due early CY2010
• Technology Adoption Program (TAP) due early CY2010
• Nominations now open
• Parallel Data warehouse to launch in summer 2010
19. Parallel DW Beta Programs
• Two Programs
– MTP – Madison Technology Preview
• 20 – 30 participants
• Duration of 4 to 6 weeks
– TAP – Beta production implementation
• 6 – 8 customers
• First iteration 9 to 12 weeks
20. Parallel DW Beta Programs
• Requirements
– Focus on EDW and large data marts
– Migration projects, not green field
– Open to customers & prospects
– 30+ TB of data…at least 4 100+ TB
– Hub-and-spoke in only a select few cases
21. Case Study: First Premier Bankcard
Existing Current Madison
Environment Challenges Highlights
Hardware Data Load Speeds Improved by 300%
16 CPU HP 8620 Itanium
Hitachi Storage 27TB Raw
SATA 21 LUNS
Analytic Capacity 30TB/160 Cores
Software Analytic Speed Query Speeds 70X
Windows 2003 SP2 Improvement
SQLServer 2008
SSIS/SSRS
Mixed Workload Concurrency
Data Warehouse Mixed Workload
18 Terabytes
Star Schema Total Cost of TCO Lowered by
80 Fact Tables
500 + Dimensions
Ownership 50%
22. Microsoft Commitment
• MTP
– High touch Support
– MS or partner will provide HW and will host the MTP
– Customer may have opportunity to engage with TAP
– MS will work with customer to define scope and success criteria
– MS will perform the bulk of MTP work (2 -3 resources)
• TAP
– Customer must procure the Madison reference architecture and
conduct the TAP in their own data center
– Premier support will be provided
– MSFT Services will be provided
– Training / mentoring will be provided
– MS will work with customer to define scope and success criteria
23. Customer Commitment
• MTP
– Customer to provide data, queries, concurrency model, existing data
model, etc.
– Customer to provide SME and DBA to answer questions of MTP team
– Customer to provide existing benchmarks
– Customer to define priorities for testing and areas of interest
– Customer to attend 2-3 day MTP interactive session and review
• TAP
– Customer to provide data, queries, concurrency model, existing data
model, etc.
– Customer to provide SME, DBA and other resources to work with MS
TAP team
– For onsite – customer to provide building access, internet access, etc
– Customer to provide PDW Reference Hardware
25. Next Steps
Proof Steps
Quick Start DW Roadmap Service
Architectural Design Session
Madison Technology Preview (MTP)
Review Madison, SQL Server Classic or Fast Track
DW HW/SW configurations and pricing
26. www.bayareasql.org
To attend our meetings or inquire about speaking opportunities,
please contact:
Mark Ginnebaugh, User Group Leader mark@designmind.com