All up datawarewhouse – From SMP
to Parallel Data warehousing
Take 1 big SAN
Add a little Server
Add a bigger Server
Add more networking
POTENTIAL PERFORMANCE BOTTLENECKS
FC
HBA
A
B
FC
HBA
A
B
FCSWITCH
STORAGE
CONTROLLER
A
B
A
B
CACHE
SERVER
CACHE
SQLSERVER
W...
It’s all about …. SIZING
One SHOEdoes not FIT ALL
Transaction
processing
simplifies and
accelerates data
capture for
accurate business
decisions
Data warehousing
enables co...
Data Warehouse Scope
DataPath
Data Warehouse
Analysis Services
Cubes
PerformancePoint
Dedicated
SAN, Storage
Array
Reporti...
Data Warehouse Scenarios
• No longer exclusive to
large enterprises and
specialists analysts
• Growth of affordable
self-s...
Microsoft Data Warehousing Offerings
Scalable and reliable
SMP platform for data
warehousing on any
hardware
Scalable and ...
Microsoft Data Warehouse Offerings
Effort to Build Very High Very Low Modera
te
Modera
te
Moderate Mode
rate
Very
Low
Capa...
Business Data Warehouse
Appliance
Business Data Warehouse Appliance
Agile
• Deploy in hours/days, not in
months
• Easy to use through built-in
dedicated too...
Scenarios
Small/Departmental
Data Warehouse
Spoke in EDW Hub and
Spoke Architecture
Reference Architectures
Fast Track Data Warehouse Components
Software:
• SQL Server 2008 R2
Enterprise
• Windows Server 2008 R2
Configuration guid...
SQL Server Parallel Data
Warehouse
SQL Server Parallel Data Warehouse
• Tier-1 Enterprise Data Warehouse Appliance Offering
– High scalability from tens to h...
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
 Client connections always go through the control
node
 Contains no persiste...
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
 Provides Support and Patching for the
Appliance
 Holds image for re-deploym...
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
 Provides high-capacity storage for data files
from ETL processes
 Is availa...
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
 Provides Integrated Backup Solution
 Integrates with 3rd party backup optio...
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
• Data Rack Servers 10
active + 1 passive
• HP ProLiant DL360 G7
compute nodes...
SQL
• Data Rack Servers 10
active + 1 passive
• HP ProLiant DL360 G7
compute nodes
• InfiniBand, FC and
Ethernet switching...
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Client Drivers
ETL Load Interface
Support/Patching
Corporate Backup
Solution
P...
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
QUERY???
???
???
???
???
???
???
???
???
???
???
???
PDW – Query Processing
Replicated
A table structure exists as a full copy within each discrete Parallel
Data Warehouse node.
Data Layout Approach...
Ultra Shared-Nothing Architecture
Extends Traditional Shared-Nothing Design
 Pushes shared-nothing architecture into the ...
Administrative Console
https://controlnodeipaddress
 Dashboard
 Query activity
 Load activity
 Backup and restore
 Ac...
Parallel Data Warehouse Configuration Manager
 Appliance topology
 Services status
 Network
configuration
 Privileges
Parallel database
copy technology
enables rapid data
movement and
consistency between
EDW and data marts
Create SQL Server...
Landing
Zone
ETL Tools
Distributed Data Warehouse Architectures
Departmental
Reporting
Regional
Reporting
High-
Performanc...
Determining the Right Solution
What is the workload?
 Number of concurrent users
 Query complexity
 Query mix
 Load pr...
Parallel Datawarehouse
 Enterprise-class scalability to hundreds of terabytes
 High performance
 Interoperability with ...
All up datawarewhouse – from smp to parallel
All up datawarewhouse – from smp to parallel
All up datawarewhouse – from smp to parallel
All up datawarewhouse – from smp to parallel
Upcoming SlideShare
Loading in …5
×

All up datawarewhouse – from smp to parallel

401 views
366 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
401
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • The HP Business Data Warehouse Appliance is a great solution for data warehouse environments with light concurrency requirements and relatively low data volumes. This workload profile is becoming increasingly common as organizations recognize the business value in using data marts and departmental data warehouses as a platform for the increasing use of business analysis tools by information workers at all levels of the business. No longer are data warehouses and BI solutions the exclusive domain of huge enterprises – they are now an increasingly important capability for small to medium businesses and decentralized departments. There’s a growing number of businesses who don’t have same concurrency and data volumes, or budgets, as large enterprises; but who want to be able to create a data warehouse for better reporting, analysis, and decision making.
  • The HP Business Data Warehouse offers a solution for the customers discussed on the previous slide. It’s a solution that is:
    Complete – the appliance comes with all the hardware and software you need, pre-configured for a data warehouse workload based on expertise from HP and Microsoft, and includes support services from a single source.
    Optimized – Experts from Microsoft and HP have designed and tuned the appliance specifically for data warehouse workloads, so you can be sure it will meet your data warehouse requirements with efficient power utilization and built in security and reliability features.
    Agile – Because the BDW is a single hardware appliance, you can just plug it in, switch it on, and within a very short period you’ll have a working data warehouse. The easy to use wizards included in the appliance make it easy to configure and load, enabling your business to start taking advantage of your data warehouse sooner than with a “self-build” solution. And while the BDW is optimized for relatively low data volumes and concurrency, if your business grows significantly you can transfer your BDW software licenses to a Fast Track solution.
  • There are two key scenarios for using the HP Business Data Warehouse appliance:
    A small business or departmental data warehouse for a small group of concurrent users who need to store and analyse up to 5 TB of data.
    A spoke in an Enterprise Data Warehouse “hub and spoke” architecture, where the BDW is used to deliver a subset of the corporate data warehouse to a specific set of users.
  • © 2004 Microsoft Corporation. All rights reserved.
    This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
  • The appliance is a complete solution with hardware, software, and service that is needed in a mission critical data warehouse. The database is highly scalable and can handle workloads of hundreds of terabytes while maintaining performance. The EDW appliance also works with your existing data warehouses and data marts so you do not have to rip and replace your current investments. Also, you can use familiar tools such as Microsoft Excel to analyze the data in your data warehouse.
  • Customers will purchase at least two racks for a complete EDW Appliance system.

    [Click]

    The control rack will have control nodes, management nodes, the landing zone, and backup nodes. The data rack will have servers that are compute nodes and storage nodes. Each of these racks and node types will be discussed in more detail.
  • Customers will purchase at least two racks for a complete EDW Appliance system.

    [Click]

    The control rack will have control nodes, management nodes, the landing zone, and backup nodes. The data rack will have servers that are compute nodes and storage nodes. Each of these racks and node types will be discussed in more detail.
  • Customers will purchase at least two racks for a complete EDW Appliance system.

    [Click]

    The control rack will have control nodes, management nodes, the landing zone, and backup nodes. The data rack will have servers that are compute nodes and storage nodes. Each of these racks and node types will be discussed in more detail.
  • Customers will purchase at least two racks for a complete EDW Appliance system.

    [Click]

    The control rack will have control nodes, management nodes, the landing zone, and backup nodes. The data rack will have servers that are compute nodes and storage nodes. Each of these racks and node types will be discussed in more detail.
  • Customers will purchase at least two racks for a complete EDW Appliance system.

    [Click]

    The control rack will have control nodes, management nodes, the landing zone, and backup nodes. The data rack will have servers that are compute nodes and storage nodes. Each of these racks and node types will be discussed in more detail.
  • Customers will purchase at least two racks for a complete EDW Appliance system.

    [Click]

    The control rack will have control nodes, management nodes, the landing zone, and backup nodes. The data rack will have servers that are compute nodes and storage nodes. Each of these racks and node types will be discussed in more detail.
  • Customers will purchase at least two racks for a complete EDW Appliance system.

    [Click]

    The control rack will have control nodes, management nodes, the landing zone, and backup nodes. The data rack will have servers that are compute nodes and storage nodes. Each of these racks and node types will be discussed in more detail.
  • Customers will purchase at least two racks for a complete EDW Appliance system.

    [Click]

    The control rack will have control nodes, management nodes, the landing zone, and backup nodes. The data rack will have servers that are compute nodes and storage nodes. Each of these racks and node types will be discussed in more detail.
  • Data layout options:
    Dimension tables are typically replicated
    Parallel Data Warehouse maintains data integrity across all nodes
    Fact tables are typically distributed
    The data model, table sizes, and workloads must all be considered when choosing between replicated and distributed tables

    The following join types are used to achieve distribution compatibility:
    Shared-nothing join: Achieves distribution compatibility by using compatible distribution keys in the SQL join criteria
    Ultra shared-nothing join: Achieves distribution compatibility through a replicated table; no data movement between nodes is required
    Redistribution join: Requires data to be dynamically distributed between compute nodes to achieve distribution compatibility
  • By taking the traditional idea of a shared-nothing architecture a step farther, the EDW appliance does not need to share any information between compute nodes. Each table is duplicated in several places to help with load balancing and fault tolerance. The hardware is redundant and supports automatic failover to the standby hardware to increase the overall system uptime.
  • The Administrative Console is an Internet Information Services (IIS) web application for SQL Server Parallel Data Warehouse that displays the appliance’s state information. Users connect to the Administrative Console through Microsoft Internet Explorer.
  • The Configuration Manager is an appliance administration tool that SQL Server Parallel Data Warehouse system administrators use to perform appliance-level operations and to change appliance-level settings. For example, use the Configuration Manager to reset passwords, set the time zone, change IP addresses, configure SSL certificates, enable remote access through the firewall, start or stop the appliance, and set Instant File Initialization.
  • A distributed data warehouse solution, such as that supported by SQL Server Parallel Data Warehouse, comprises a centralized EDW and a set of loosely coupled data marts. For many years, this has been the preferred approach for enterprise-wide data warehousing, and numerous studies since 2003 confirm that hub and spoke is the most popular data warehouse architecture among DW professionals. Traditionally, implementing a hub and spoke architecture has been challenging due to practical limitations of the database engine and network resources.

    [Click to display types of spoke]
    With SQL Server Parallel Data Warehouse, you can create a diverse range of types of spoke, from SQL Server Parallel Data Warehouse MPP appliances for user groups that have extreme scalability requirements, Fast Track data warehouse implementations, SQL Server 2008 Enterprise data warehouses, and even SQL Server 2008 Analysis Services OLAP databases.

    [Click to display parallel database copy point]
    However, the SQL Server Parallel Data Warehouse parallel database copy technology enables rapid data integration between spokes and the SQL Server Parallel Data Warehouse hub, making it easier to build hub and spoke solutions that integrate your diverse data marts and the enterprise data warehouse.

    [Click to display multiple-user SLA point]
    The SQL Server Parallel Data Warehouse hub and spoke architecture enables you to support user groups with very different SLAs; supports hot, warm, and cold data; supports different requirements for loading data loading, and more.
  • The EDW appliance can be the central hub in this architecture. The spokes can be anything from a SQL Server departmental data mart to a Fast Track reference implementation, a business decision appliance, or a SQL Server Analysis Services system. EDW is not restricted to any particular model, and the high-speed data copy features enable multiple clients.
  • With so many choices, there are always questions about which solution is right for the organization. These questions help you to determine the correct solution. While there is rarely any one deciding factor, you can find a solution that is optimized for the things that are most important to you.
  • The EDW appliance fits in with your existing data warehouse solutions and will enable you to query and report on the large amount of data stored in the appliance.
  • All up datawarewhouse – from smp to parallel

    1. 1. All up datawarewhouse – From SMP to Parallel Data warehousing
    2. 2. Take 1 big SAN Add a little Server Add a bigger Server Add more networking
    3. 3. POTENTIAL PERFORMANCE BOTTLENECKS FC HBA A B FC HBA A B FCSWITCH STORAGE CONTROLLER A B A B CACHE SERVER CACHE SQLSERVER WINDOWS CPUCORES CPU Feed Rate HBA Port Rate Switch Port Rate SP Port Rate A B DISK DISK LUN DISK DISK LUN SQL Server Read Ahead Rate LUN Read Rate Disk Feed Rate
    4. 4. It’s all about …. SIZING
    5. 5. One SHOEdoes not FIT ALL
    6. 6. Transaction processing simplifies and accelerates data capture for accurate business decisions Data warehousing enables common data model for single version of the truth Analysis leads to optimized business processes and improved performance
    7. 7. Data Warehouse Scope DataPath Data Warehouse Analysis Services Cubes PerformancePoint Dedicated SAN, Storage Array Reporting Services Web Analytic Tools Integration Services ETL SharePoint Services Microsoft Office SharePoint Data Staging, Bulk Loading Supporting Systems BI Data Storage Systems Presentation Layer Systems Data Warehouse Scope (dashed) PresentationDataPresentationData
    8. 8. Data Warehouse Scenarios • No longer exclusive to large enterprises and specialists analysts • Growth of affordable self-service BI tools such as PowerPivot and Reporting Services has created a DW requirement for smaller businesses and individual departments
    9. 9. Microsoft Data Warehousing Offerings Scalable and reliable SMP platform for data warehousing on any hardware Scalable and reliable platform for data warehousing on any hardware Reference architectures offering best price performance for data warehousing Appliance for high end MPP Data Warehousing delivering highest scalability and performance Ideal for data marts or small to mid-sized enterprise data warehouses (EDWs) Ideal for large data marts or mid-sized EDWs Ideal for data marts or small to mid-sized data warehouses with scan-centric workloads Ideal for high scale or high performance data marts and EDWs Software only Integrated Appliance (Software and Hardware) Reference Architectures (Software and Hardware) DW Appliance (Fully integrated Software and Hardware) Scale-Up DW Scale-Up DW Scale-Up DW Scale-Out DW with MPP 10s of terabytes <5 terabytes 5–80 terabytes 10s - 100s of TB Software Assurance; Premier Mission Critical Support 3-Year Support Plus 24 Software Assurance; Premier Mission Critical Support Mission Critical Advantage Program Enterprise Fast Track Data Warehouse RA BDW Appliance Parallel Data Warehouse
    10. 10. Microsoft Data Warehouse Offerings Effort to Build Very High Very Low Modera te Modera te Moderate Mode rate Very Low Capacity Variable 5 TB 14 TB 20 TB 40 TB 40 TB 500 TB Concurrency Variable Light Light Medium Medium High Very High Query Complexity Variable Medium Mediu m Medium Medium High Very High
    11. 11. Business Data Warehouse Appliance
    12. 12. Business Data Warehouse Appliance Agile • Deploy in hours/days, not in months • Easy to use through built-in dedicated tools to load and manage your data warehouse • Designed for up to 5TB data warehouses • Fast Track 3.0 compliant, license path to Fast-Track Complete • Hardware + Software + Services • Pre-tuned, pre configured, pre- installed. Turn on and go! • Single point of contact for support Optimized • Specifically for small to medium data warehouse workload • Designed for performance, energy efficiency, and value by HP and Microsoft’s best engineers • Security and reliability built in
    13. 13. Scenarios Small/Departmental Data Warehouse Spoke in EDW Hub and Spoke Architecture
    14. 14. Reference Architectures
    15. 15. Fast Track Data Warehouse Components Software: • SQL Server 2008 R2 Enterprise • Windows Server 2008 R2 Configuration guidelines: • Physical table structures • Indexes • Compression • SQL Server settings • Windows Server settings • Loading Hardware: • Tight specifications for servers, storage and networking • ‘Per core’ building block
    16. 16. SQL Server Parallel Data Warehouse
    17. 17. SQL Server Parallel Data Warehouse • Tier-1 Enterprise Data Warehouse Appliance Offering – High scalability from tens to hundreds of terabytes – High performance through the MPP system • Flexibility and Choice – Choice of deployment options through distributed architecture • Most Comprehensive Solution – Complete data warehouse solution spanning desktop, enterprise data warehouse, and data marts
    18. 18. SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL
    19. 19. SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL  Client connections always go through the control node  Contains no persistent user data  Parallel Data Warehouse advantages: oProcesses SQL requests oPrepares execution plan oOrchestrates distributed execution  Local SQL Server processes final query plan and aggregates results  Provided by DataDirect oOpen database connectivity (ODBC), object linking and embedding database (OLE DB), Java Database Connectivity (JDBC), and ActiveX® Data Objects (ADO.net) client drivers oWire protocol (SeQuel link) oDrivers are available for 32 bits and 64 bits CONTROL NODE
    20. 20. SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL  Provides Support and Patching for the Appliance  Holds image for re-deployment of compute node  Holds Active Directory MANAGEMENT NODE
    21. 21. SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL  Provides high-capacity storage for data files from ETL processes  Is available as a sandbox for other applications and scripts that run on the internal network  Provides SQL Server Integration Services LANDING ZONE Source Landing Zone Files Data Loader Compute Nodes DWLoader or SQL Server Integration Services
    22. 22. SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL  Provides Integrated Backup Solution  Integrates with 3rd party backup option  Orderable in different sizes BACKUP NODE
    23. 23. SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL • Data Rack Servers 10 active + 1 passive • HP ProLiant DL360 G7 compute nodes • InfiniBand, FC and Ethernet switching, 42U rack • Expansion Grow from 1– 4 data racks, storage options, test/dev system • Storage 10x HP StorageWorks MSA P2000 G3 • Consists of COMPUTE NODES and STORAGE NODES
    24. 24. SQL • Data Rack Servers 10 active + 1 passive • HP ProLiant DL360 G7 compute nodes • InfiniBand, FC and Ethernet switching, 42U rack • Expansion Grow from 1– 4 data racks, storage options, test/dev system • Storage 10x HP StorageWorks MSA P2000 G3 COMPUTE NODE  Each MPP node is a highly tuned symmetric multi-processing (SMP) node with standard interfaces  Provides dedicated hardware, database, and storage  Runs SQL Server  Spare Node provides failover in case of node failure  Drives are configured as RAID 1
    25. 25. SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL Client Drivers ETL Load Interface Support/Patching Corporate Backup Solution PDW – Client Connectivity
    26. 26. SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL SQL QUERY??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? ??? PDW – Query Processing
    27. 27. Replicated A table structure exists as a full copy within each discrete Parallel Data Warehouse node. Data Layout Approaches Distributed A table structure is hashed on a single column and uniformly distributed across all nodes on the appliance. Each distribution is a separate physical table in the database management system (DBMS). Ultra Shared-Nothing Provides the ability to design a schema of both distributed and replicated tables to minimize data movement between nodes.  Small sets of data can be more efficiently stored in full (replicated).  Certain set operations (such as single-node operations) are more efficient against full sets of data.
    28. 28. Ultra Shared-Nothing Architecture Extends Traditional Shared-Nothing Design  Pushes shared-nothing architecture into the SMP node—there is IO and CPU affinity within SMP nodes o Eliminates contention for user queries o Uses full resources for each user query  Provides multiple physical instances of tables o Distributes large tables o Replicates small tables  Redistributes rows as needed Provides Fault Tolerance  All hardware components have redundancy (including CPUs, disks, networks, power, and storage processors)  Control and compute nodes use failover clustering  Management nodes have active and standby states
    29. 29. Administrative Console https://controlnodeipaddress  Dashboard  Query activity  Load activity  Backup and restore  Active locks  Active sessions  Alerts  Appliance state
    30. 30. Parallel Data Warehouse Configuration Manager  Appliance topology  Services status  Network configuration  Privileges
    31. 31. Parallel database copy technology enables rapid data movement and consistency between EDW and data marts Create SQL Server 2008 R2, Fast Track Data Warehouse, and SQL Server Analysis Services Data Marts Supports user groups with very different service-level agreements (SLAs): • Performance • Capacity • Loading • Concurrency Flexible Business Alignment A distributed architecture gives you the flexibility to add or change diverse workloads or user groups while maintaining data consistency across the enterprise
    32. 32. Landing Zone ETL Tools Distributed Data Warehouse Architectures Departmental Reporting Regional Reporting High- Performance Reporting Central EDW Hub Regional Reporting with Business Decision Appliance Third- Party RDBMS Third- Party Data Integrati on Mobile Applicati ons
    33. 33. Determining the Right Solution What is the workload?  Number of concurrent users  Query complexity  Query mix  Load processing  Performance requirements What is the customer looking for in a solution?  Simplicity in the appliance  100 percent compatibility with SQL Server 2008 R2  Enterprise scalability  Economical hardware  Incremental expansion and high availability by default
    34. 34. Parallel Datawarehouse  Enterprise-class scalability to hundreds of terabytes  High performance  Interoperability with leading BI products  Mission critical support and maintenance  Mature SQL Server platform with high security and robust engineering process  Strong data warehouse vision and roadmap that includes industry- leading technologies Value to Customer Supporting Features  MPP with ultra shared-nothing architecture  Distributed query optimization  Balanced hardware with pre-tested and pre-tuned appliances optimized for data warehousing  Third-party product integration (for example, Microstrategy, Business Objects, and Informatica)  Mission critical support and maintenance  Road map includes column store, petabyte scalability, real-time data warehousing, MDM,

    ×