The benefit of the Pure Systems approach can be summarized in the four basic tenets of SPEED, SIMPLICITY, SCALABILITY, and SMARTS – what we like to refer to as the 4 S ’s : PureData for Analytics is 10-100X faster than traditional systems, like Oracle. When analytic queries take seconds instead of hours to perform, customers get the opportunity to completely rethink their business processes and in some cases, even launch entirely new businesses PureData for Analytics is unlike anything that DBAs and IT teams have experienced in the past. Whereas Oracle and Teradata data warehouses require armies of specialists to manage, PureData for Analytics offers performance out-of-the-box, without requiring any tuning, indexing, aggregations, etc. A single appliance scales to more than a petabyte of user data capacity, not just acting as a repository for information, but allowing complex analytics to be conducted at-scale, on all the enterprise data By embedding analytics deep into the data warehouse, PureData for Analytics powers high performance advanced analytics 100 ’s or even 1000’s of times faster than possible before
About 80% of the components in the TwinFin are manufactured by IBM. A fully populated rack is a TwinFin12 The Host: Active/Passive failover configuration. Disk Drives: housed on the top of the rack. S-Blades: IBM technology married to the Netezza FPGA cores.
Netezza adds the FPGA to the IBM blade center blade using IBM’s “side-car” technology to match a set of FPGA cores to a corresponding set of processor cores. This is called the Database Accelerator card.
Looking at the S-Blade components side-by-side, you can easily see where Netezza came up with the TwinFin name for the product. A twin fin surfboard is a board with two fins, and this style of board rides much faster than it’s single-fin counterparts. Two fins are better than one. The standard blade is to the left, and the Netezza FPGA “side-car” is shown to the right. This allows the FPGA cores to have visibility to all the data and processing on the corresponding blade. A SAS Expander Module is included in both halves of the S-Blade to allow the processors and FPGAs to have access to the disk drives holding the warehouse data. There is always an even balance of S-Blade components: 1 CPU core : 1 FPGA core : 2GB RAM : 1 disk drive. 8 CPU cores and 8 FPGA cores on each S-Blade mapped to 8 individual disk drives. This is the nature of the MPP shared nothing architecture that ensures linear scalability.
The architecture is a combination of SMP and MPP, so we call it AMPP: Asymmetric Multi-Parallel Processing Architecture. SQL statements come into the Host over the network from external applications, e.g., from the shown BI application through an open API like ODBC. The front-end of the Netezza appliance is a light-weight host with about 300GB of storage. None of the data lives on the host. The host receives SQL statements, compiles the SQL and build a compiled query plan. The host broadcasts the query instructions to all of the data “nodes”. Connecting the major components of the Netezza system is a 10GBit internal ethernet network. Storage is implemented with an MPP architecture: many (100’s or 1000’s) nodes are processing data queries in parallel. Each data processing node is attached to a single disk drive that is divided into three partitions. Each node will process it’s slice of the table data for the given query. Each node will transmit its results back to the host, which will accumulate all of the results from all nodes into the final result set to be returned to the calling application or user. Classic divide-and-conquer strategy. This in and of itself is not unique to the industry but is common to most MPP architectures. One difference is that the back-end disk storage is not directly accessible by the SMP host.
A key component of Netezza’s performance is the way in which its streaming architecture processes data. The Netezza architecture uniquely uses the FPGA as a turbocharger … a huge performance accelerator that not only allows the system to keep up with the data stream, but it actually accelerates the data stream through compression before processing it at line rates, ensuring no bottlenecks in the IO path. You can think of the way that data streaming works in the Netezza as similar to an assembly line. The Netezza assembly line has various stages in the FPGA and CPU cores. Each of these stages, along with the disk and network, operate concurrently, processing different chunks of the data stream at any given point in time. The concurrency within each data stream further increases performance relative to other architectures. Compressed data gets streamed from disk onto the assembly line at the fastest rate that the physics of the disk would allow. The data could also be cached, in which case it gets served right from memory instead of disk. The first stage in the assembly line, the Compress Engine within the FPGA core, picks up the data block and uncompresses it at wire speed, instantly transforming each block on disk into 4-8 blocks in memory. The result is a significant speedup of the slowest component in any data warehouse—the disk. The disk block is then passed on to the Project engine or stage, which filters out columns based on parameters specified in the SELECT clause of the SQL query being processed. The assembly line then moves the data block to the Restrict engine, which strips off rows that are not necessary to process the query, based on restrictions specified in the WHERE clause. The Visibility engine also feeds in additional parameters to the Restrict engine, to filter out rows that should not be “seen” by a query e.g. rows belonging to a transaction that is not committed yet. The Visibility engine is critical in maintaining ACID (Atomicity, Consistency, Isolation and Durability) compliance at streaming speeds in the Netezza. The processor core picks up the uncompressed, filtered data block and performs fundamental database operations such as sorts, joins and aggregations on it. It also applies complex algorithms that are embedded in the snippet code for advanced analytics processing. It finally assembles all the intermediate results together from the entire data stream and produces a result for the snippet. The result is then sent over the network fabric to other S-Blades or the host, as directed by the snippet code.
We do not have indexes. They are not an option, they simply do not exist. There is no disk administration or SA administraion. Day 2, the customer has a pool of disk performant ready. Upgrades are performed by Netezza as standard maintenance tech support call. Does Oracle help you go from 9i to 10g? Instead of spending time and effort on tedious DBA tasks, use the time for higher BUSINESS VALUE tasks: Bring on new applications and groups Quickly build out new data marts Provide more functionality to your end users
As data volumes grow, oracle complexity increases. As new indexes are created in oracle, you break existing reports. All of this (indexes, partitioing) is an attempt to out guess the user’s data access. Netezza is database 101. This is as complicated as it gets.
Lets focus on the High-Performance series as this is the model that the majority of our existing systems equate to and will form the majority of Data Warehouse/Data Mart implementations. You can see that the systems start as small as 3 S Blades (24 disk, FPGA, Processors). You can then grow this system within the Frame upto 12 S Blades, 96 processors, 125TBs of user table space. Beyond this you can add additional Frames, right up to 10 frames with a processing power of 896 Snippet Processing Units and 1.25PB’s of storage. The Entry Level Development and Test system enables customers to have smaller Dev/Test environment’s but maintain the processor/FPGA/disk ratios.
We make some very bold claims … That we can deliver 10-100X performance that can transform businesses That our appliance is the “true appliance” and very simple to deploy and manage, requiring no tuning That TwinFin can scale to petabyte-plus of user data, while delivering the 10-100X performance, and help you keep up with growing data That our true appliance is Smart and will serve as the foundation for better, more intelligent decision making in the enterprise These are not claims we make in a vacuum We will back these claims up … On your data, at your site with our appliance … At no risk to you Take TwinFin for a Test Drive and find out for yourself what the TwinFin can do for you!
Robert Hartevelt, IBM - PureData System For Analytics - BI Symposium 2012