Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Comparison of MPP
Data Warehouse Platforms
- David Portnoy-
- 312.970.9740-
http://LinkedIn.com/in/DavidPortnoy
© 2013-2014
What’s MPP in data warehousing?
MPP (massively parallel processing) data warehouse systems
are different from SMP (symmetr...
Who are the players?
Previously, we discussed just the specialized MPP data warehouse vendors:
 Teradata
 Netezza
 Vert...
How to the architectures compare?
Looking at the specialized MPP data warehouse vendors
Teradata Netezza Greenplum Vertica...
The industry is moving towards open, commodity solutions
Traditional database servers, such as IBM DB2, Oracle Exadata and...
Specialized
Hardware
Commodity
Hardware
Open Source,
Standardized Software
Proprietary Software
So the trend looks somethi...
Teradata
Hardware and licenses the most
expensive of all options. Staff costs can
be expensive and it takes a great deal o...
What’s their relative adoption today?
Comparing the supply and demand for administrators and developers can
be a proxy for...
Over time, interest in market leader Teradata has been consistent, but flat
While Netezza, Vertica, and Greenplum have gro...
But when Hadoop is added into the mix, the picture changes drastically
Interest in Hadoop has quickly overtaken even tradi...
Related Reading
Hybrid Data Warehouse-Hadoop Implementations:
http://www.slideshare.net/DavidPortnoy/hybrid-data-warehouse...
Upcoming SlideShare
Loading in …5
×

Comparison of MPP Data Warehouse Platforms

43,148 views

Published on

Comparison of MPP Data Warehouse Platforms, including key differences, architectures, trends, costs, maturity and marketshare

Published in: Technology

Comparison of MPP Data Warehouse Platforms

  1. 1. Comparison of MPP Data Warehouse Platforms - David Portnoy- - 312.970.9740- http://LinkedIn.com/in/DavidPortnoy © 2013-2014
  2. 2. What’s MPP in data warehousing? MPP (massively parallel processing) data warehouse systems are different from SMP (symmetric multiprocessing) databases: 1. Shared-nothing architectures, with no single point of failure and often hot-swappable components 2. Scale horizontally by adding nodes, rather than moving to a server with more CPUs or higher storage capacity 3. Breaks a large queries across nodes for simultaneous processing 4. Capable of higher data ingestion rates through parallelized data movement
  3. 3. Who are the players? Previously, we discussed just the specialized MPP data warehouse vendors:  Teradata  Netezza  Vertica  Greenplum …But We should keep in mind that most major database vendors also have their own MPP products for data warehousing. Examples include:  Microsoft PDW (Parallel Data Warehouse)  DB2 UDB with Database Partitioning Feature (DPF)  Oracle Big Data Appliance, which just provides a gateway between Hadoop to their SMP RDBMS platform Finally, we need to consider the emergence of SQL-oriented, low-latency Hadoop solutions. Examples include:  Impala; Stinger; Apache Drill; Phoenix; Shark; Hadapt  Teradata’s SQL-H (Aster Data); EMC’s HAWQ; IBM’s BigSQL See related writeup: http://www.slideshare.net/DavidPortnoy/hybrid-data- warehouse-hadoop-implementations
  4. 4. How to the architectures compare? Looking at the specialized MPP data warehouse vendors Teradata Netezza Greenplum Vertica Hardware Custom MPP, Shared Nothing Custom MPP: SPU + FPGA logic Commodity hardware Custom Hybrid MPP, Shared Everything Type of processing OLTP or OLAP, Can handle high user load OLAP, Assumes few users for heavy analytics OLAP OLAP optimized for large fact tables Inception / Maturity 1979 From Caltech 2000 By Saxena & Hinshaw 2003 From Metapa & Didera 2005 By MIT’s Stonebaker Performance & maintenance Auto-recommended optimization, columnar compression available No need for performance tuning, Must manually reclaim space Based on PostgreSQL, but optimized for MPP and enterprise maint. Column oriented optimization for ingestion, storage/compression, and access Hardware Proprietary Proprietary Commodity Commodity Definitions * OLAP: Online Analytical Processing * OLTP: Online Transaction Processing
  5. 5. The industry is moving towards open, commodity solutions Traditional database servers, such as IBM DB2, Oracle Exadata and Microsoft SQL Server, license proprietary software, but run on commodity hardware. Although the nature of SMP architecture typically favors having a few large expensive servers. But the biggest MPP data warehouse vendors all have proprietary software. That’s despite the fact that Netezza and Vertica were on the open source PostgreSQL database. Teradata and Netezza even implement custom hardware, which drives up the price. Hadoop has open sourced the software component leading to a vibrant ecosystem of tools and applications. And with built in redundancy, it’s easy to deploy on cheap commodity servers.
  6. 6. Specialized Hardware Commodity Hardware Open Source, Standardized Software Proprietary Software So the trend looks something like this Hadoop ** While up-front cost of Hadoop may be lower, the TCO (total cost of ownership) could be relatively much higher. This is due to the maturity of product, complexity of solutions and scarcity of talent. Traditional Database MPP Data Warehouse
  7. 7. Teradata Hardware and licenses the most expensive of all options. Staff costs can be expensive and it takes a great deal of effort to configure and administer. IBM Netezza Hardware and licenses used to be much less than Teradata, but prices have been converging. Some of the highest staff cost due to scarcity, but that’s tempered by lower effort for configuration and admin of single purpose appliance. Greenplum Commodity hardware. Moderately priced licenses. Few Greenplum specialists, but can be staffed by PostgreSQL DBAs and developers. Vertica Commodity hardware. Moderately priced licenses, but special purpose orientation limits usefulness. Few specialists, but can be staffed by traditional DBAs and developerss. Hadoop HBase Commodity hardware and no license cost, resulting in lowest up-front cost. Likely to buy more hardware for redundancy and load. But requires highly technical staff and implementation is less productive than more mature options. So lets look at the relative cost breakdown Hardware & Licenses Development Hardware Licenses Development Hardware & Licenses Development Hardware Development Hardware Licenses Development
  8. 8. What’s their relative adoption today? Comparing the supply and demand for administrators and developers can be a proxy for the strength and staying power of a platform. Teradata has been around for many years longer than the alternatives and still dominates the market in terms of install base (3 times next rival) and vibrant development community (6 times next rival). But in recent years Hadoop solutions have outstripped Teradata by a significant margin. (Of course, it should be noted that Hadoop includes use cases outside of traditional data warehousing.)
  9. 9. Over time, interest in market leader Teradata has been consistent, but flat While Netezza, Vertica, and Greenplum have grown, they didn’t take significant market share away from Teradata. (The spike in Netezza interest is attributed to its acquisition by IBM.)
  10. 10. But when Hadoop is added into the mix, the picture changes drastically Interest in Hadoop has quickly overtaken even traditional Teradata Which might explain why Teradata has been on an acquisition spree for Hadoop related products and services, such as Aster Data The future of its next biggest rival, Netezza, is uncertain as it seeks its niche within IBM’s product lineup.
  11. 11. Related Reading Hybrid Data Warehouse-Hadoop Implementations: http://www.slideshare.net/DavidPortnoy/hybrid-data-warehouse- hadoop-implementations Agile Business Intelligence: http://www.slideshare.net/DavidPortnoy/agile-bi-18491924 Blog: http://david.portnoy.us

×