Data Warehousing Appliances – Fad or Future?David M WalkerData Management & WarehousingDecember 2006Despite all the hype from vendors the basics of data warehousing have remainedfundamentally unchanged – extract data from multiple source systems, reformat theinformation into an easy to query structure, load it into a dedicated database and addan effective user interface to allow users to query the information. The cost of thisenvironment is substantial and directly relates to the complexity of the Extract,Transform Load (ETL) process and the volume of data held in the system.The complexity of the ETL process has two cost impacts: the first is in the cost of theinitial design and development and is reasonably well understood. The second is thecost of changes over the lifetime of the system, for example if an organisation havefour source systems and each system under goes a change once a quarter then the datawarehouse support team have to modify and test an interface every three weeks, andall this without any changes in the users requirements. The volume of data also hitsthe bottom line, not only in the cost of storage but in the size and (more expensive)skills of team required to support it, especially as data explosion forces the business toenter the very large database arena where load time and user query performance arecritical.Against this background it is unsurprising that vendors are looking to compete byreducing storage, improving query times and simplify administration. Oracle havetaken steps to enhance their core database engine with features that improve each ofthese areas and continue to develop their strategy, however more and more is builtinto the core of its flagship general purpose engine resulting in software that has manyfeatures not needed by a specific application. Sybase have taken the more radical stepof creating an entirely new database engine called Sybase IQ that does away withsome of the limitations required of a general purpose engine to produce a solution thatis both much faster in load and user query performance and far more efficient in itsdisk usage than other general purpose databases.Into this market enters the data warehousing appliance vendors, a breed of dedicatedintegrated hardware and software solution designed to solve a business’ datawarehousing woes. Such systems use low cost commodity components in largevolumes with dedicated business intelligence engines to deliver radically faster loadtimes whilst at the same time reducing the query times and simplifying the systemsadministration process.The first hurdle for many organisations is that data warehousing appliances areproprietary going against a corporate policy of open systems to allow technology re-use, however a solution built on one of the current market leading platforms,Terradata, is no less so. In fact Terradata can be considered one of the original datawarehouse appliances and it is the use of the low-cost commodity components and theability to achieve massive parallelism by the new-comers that differentiates them.
The second hurdle is credibility – the promises of such large benefits (typically queryperformance of ten to fifty times faster whilst using three to six times less storage on aplatform that only requires a small amount of systems administration support) will bedoubted, often by systems and database administrators who have had to work so hardto maintain the performance of the existing solution. Vendors such as Netezza haveovercome this challenge with some key accounts by providing a system on the basisthat if it meets agreed performance criteria it will be purchased and thus significantlyreducing the risk to the purchasing company.The final obstacle is migration: an existing solution that is build, for example, on anOracle database, using Oracle Warehouse Builder and Oracle Discoverer iseffectively proprietary and therefore more difficult, but not impossible, to migrate.This is also a reason to review the existing data warehousing architecture now toensure that as these and other new technologies come along the business will be ableto take advantage of them.Those organisations that have overcome the hurdles report that they are achieving theimmediate huge performance gains for their queries without the need for tuning thedatabase whilst lowering the disk footprint and reducing the support costs. Thesystems also continue to deliver benefit as the fast query times allow more complexdata models to be queried, which in turn reduces the need for complex ETL torestructure the data. These changes to the data model and to reduce the complexity ofthe ETL can be made either as part of the migration project (which delivers the largestbenefit quickly but at the greatest risk) or as part of the change management processfor the source systems (which delivers benefit over a longer time frame butsignificantly reduces the risk).With a number of entrants into the market including pure appliance players Netezzaand DATAllegro and those developing variations such as Kognitio (offering a virtualappliance) and Sybase (offering an appliance bundle called Data Integration Suite) itis clear that appliances are going to form a key part of data warehouse architecturesgoing forward, the risks of using a smaller vendor and a proprietary solution beingoutweighed by the business benefit of much more timely information at a significantlyreduced cost.David Walker is a principle consultant with Data Management & Warehousing(http://www.datamgmt.com), a company that has been providing strategic businessintelligence consultancy as well as designing large scale data warehousing solutionsto clients around the world since 1995. David can be contacted email@example.com or on 07050 028 911.