Sybase IQ Big Data


Published on

Published in: Technology, Business
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Sybase IQ Big Data

  1. 1. Sybase IQIssue 1 Introduction2012 “Big Data” is the new hot topic for IT managers, and is causing quite a panic amongst some organizations; but, there is no need to panic, Big Data can be looked upon as Big Opportunity.IN THIS ISSUE With the data explosion companies now have access to more information than ever before – if the data can be exploited properly it can lead to a big competitive advantage.Introduction.........................1 With companies acquiring massive amounts of data in different forms from different sources,SAP Sybase IQ - Turning ranging from traditional channels with structured formats to social media channels withBig Data into a Big unstructured formats, it has changed the focus of analytics in the “real-world”. Throughout organizations there are changes in the way data is being analyzed – in marketing, the focus hasAdvantage............................. 2 shifted to digital channels – click streams and social media – to understand buying patterns, andGartner Research: target marketing activities for maximum impact. In sales, the focus is on what we call “dealMagic Quadrant for Data DNA”, to correlate emails, meeting notes and chatter to assess the probability that a salesWarehouse Database deal will close. On the financial side, simulation is being used to predict margins and portfolio values; while on the operational side, machine data via sensors, and other kinds of digital data areManagement Systems......... 5 being analyzed to track down operational inefficiencies – it’s no wonder companies are havingAbout Sybase..................... 29 information overload and are at a loss as to how to manage the information let alone how to use that information intelligently. The key to Big Data is the ability to access and connect all the data no matter what type or where it came from, in order to achieve this you have to break the information silos that trap data – turning massive amounts of data into actionable insight while providing complete access to decision makers – creating an environment that offers “intelligence for everyone”. Featuring research from
  2. 2. SAP Sybase IQ – Advanced volume, variety and velocity of today’s Massive ScalabilityAnalytics Platform for massive data needs and demands in a cost effective and attainable manner. With a state of the art query processorBig Data Sybase IQ thrives on heavy ad hoc query Sybase IQ is based on a three layer usages by large numbers of concurrentSAP Sybase IQ is an analytic DBMS architecture. A strong data management users – it’s designed to handle it. Builtdesigned specifically for advanced layer is the foundation with a highly on PlexQ™ technology framework thatanalytics, data warehousing, and business compressed column store, and shared delivers a shared-everything massivelyintelligence environments. Able to work everything distributed MPP elastic cluster parallel processing (MPP) architecturewith massive volumes of structured and that supports a variety of workloads and based on a columnar data store, itunstructured data it is ideally suited to Big active user community. The application delivers new levels of performance.Data. services layer sits above that to provide Unlike shared nothing solutions, a PlexQ a variety of drivers, APIs, web services, grid dynamically manages analyticsSybase IQ is built on an open, flexible and federation capabilities to empower workloads across an easily expandablecolumn-store technology, unlike developers. And wrapped around these grid of computing resources dedicated totraditional relational databases, that store two technology layers, is a rich ecosystem different groups and processes, makingdata by row, slowly working through of BI tools, partner libraries, packaged it simpler and more cost-effective toeach row of entire tables, clogging I/O applications, and data integration tools support growing volumes of data andchannels, memory, and disk, Sybase IQ to give you an end to end solutions. (See rapidly growing user communities.uses a strategy called “vertical portioning” Figure 1)that stores data by column, reading onlythe columns of data used by the query. With PlexQ grid technology, enterpriseUsing columns, not rows, delivers a 10 to Centralized Access to All IT departments can more easily overcome100 times performance boost compared Your Enterprise Data the scalability limitations of traditionalto the traditional row-based approaches data warehouses. Organizations are– and Sybase IQ supports most of the Sybase IQ centralizes “Big Data” analysis now able to support user communitiespopular hardware and OS configurations. of massive volumes of structured and across the enterprise, and integrate unstructured data together using a analytics into business workflows. And,Big Data is not new to Sybase. Sybase IQ wide range of advanced techniques it’s easy to leverage advanced analyticshas been building on the vision of a big and technologies – offering a data type within applications by using hundreds ofdata analytics platform for several years agnostic engine Sybase IQ doesn’t care algorithms and data mining models thatnow – the new Sybase IQ 15 family has what format of data you have or even can run inside Sybase IQ.been a steady progression of releases where it came from. Whether it be Elastic computing of logical servers in thethat have followed a conscious roadmap, structured in a defined format, semi- PlexQ™ technology framework withineach one adding innovations that build structured available electronically, Sybase IQ allow IT staff to group togetherupon the foundation and strengths of unstructured requiring text mining or compute resources, in a PlexQ grid, intothe previous release. Sybase IQ has been analytics tool extraction or web data, virtual groups in order to isolate thedesigned to meet the growing needs of such as, social media – it simply doesn’t impact of different workloads and usersIT and Business Analysts to tame the matter with Sybase IQ. from each other. When a user connects to a logical server and runs a query, theSAP Sybase IQ - Turning Big Data into a Big Advantage is published by Sybase. Editorial supplied by Sybase is independent of Gartner analysis. All Gartner research is ©2012 by Gartner, Inc. All rights reserved. All Gartner materials are used with Gartner’s permission. The use or publication of Gartner research does not indicate Gartner’sendorsement of Sybase’s products and/or strategies. Reproduction or distribution of this publication in any form without prior written permission is forbidden. The infor-mation contained herein has been obtained from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of suchinformation. Gartner shall have no liability for errors, omissions or inadequacies in the information contained herein or for interpretations thereof. The opinions expressedherein are subject to change without notice. Although Gartner research may include a discussion of related legal issues, Gartner does not provide legal advice or servicesand its research should not be construed or used as such. Gartner is a public company, and its shareholders may include firms and funds that have financial interests in enti-ties covered in Gartner research. Gartner’s Board of Directors may include senior managers of these firms or funds. Gartner research is produced independently by itsresearch organization without input or influence from these firms, funds or their managers. For further information on the independence and integrity of Gartner research,see “Guiding Principles on Independence and Objectivity” on its website,
  3. 3. Figure 1: SAP Sybase IQ - A complete and comprehensive big data analytics platform Source: Sybasequery execution is only distributed to For statistics and data mining Sybase IQ techniques such as network analysis or formember nodes of the logical server, and supports a DBLytix library from Fuzzy searching large amounts of unstructuredmember nodes can be dynamically added Logix containing hundreds of advanced data that is not indexed.or dropped as necessary. analytic, statistical and data mining algorithms that can run inside Sybase IQ. In addition to a native MapReduceSpecialized Tools & API, Sybase IQ offers four ways toTechniques For text analytics Sybase IQ provides integrate results from 3rd party Hadoop comprehensive in-database text search frameworks into Sybase IQ queries, givingSybase IQ has partnered with a number capabilities. With Sybase IQ’s key a tiered approach to analyzing massiveof key advanced analytic partners in Analytics partnerships – both internal and data sets. In essence, massive volumesorder to provide key in-database analytics external, such as, SAP BusinessObjects, of data can be searched from distributedtechniques. Using in-database analytics ISYS and KAPOW, hundreds of document file systems. The data returned from aenterprises and application vendors can formats and Web content can be ingested Hadoop analysis can then be integratedanswer complex questions without having and/or extracted into Sybase IQ for into a Sybase IQ database in any of theto move mountains of data to 3rd party analysis. four ways:tools. With hundreds of statistical and • ETL Processing, which bulk loaddata mining techniques, advanced text Sybase IQ provides a native MapReduce data from Hadoop data stores intoanalytics capabilities, and APIs to execute API that can leverage massively parallel Sybase IQ using the open sourceproprietary algorithms safely inside processing across a PlexQ™ grid. Using utility SCOOP from Sybase’sSybase IQ, companies can gain insights in MapReduce allows you to move beyond partner Cloudera.unparalleled time. limitations with SQL queries, enabling • Data Federation, which exposes you to more easily execute alternative HDFS files as tables in a Sybase IQ database that participate in SQL 3
  4. 4. queries (HDFS files do not need to to search and filter data for analysis in movement can impose severe constraints be loaded into Sybase IQ). combination with its column store engine. on timely delivery of the results you • Query Federation, allowing SQL Following the SQL Multimedia (SQL/ need to succeed and can account for up queries in Sybase IQ to execute MM) standard for storing and accessing to 75% of cycle time. By running analytic Hadoop processes that return data geospatial data, Sybase IQ supports 2D techniques inside the database Sybase that is incorporated into the SQL geometries in the form of points, curves IQ dramatically accelerates performance, result set, and finally. (line strings and strings of circular arcs), while avoiding governance and security and polygons. Sybase IQ also supports flat concerns caused by data movement. You • Client-side Federation, which and round-Earth representations, allowing need an analytics environment that can federates queries across Sybase you to choose the approach that best analyze large volumes of data from diverse IQ databases and Hadoop files addresses your situation. sources and provide fast, accurate results using the TOAD© SQL tool from - Sybase IQ’s in-database capabilities can Sybase’s partner Quest. Sybase IQ provides enterprises with APIs give you this advantage. to create proprietary analytic algorithmsUse “R”, the popular open source that can run inside the Sybase IQ database Successful Analytics Platformstatistical tool, to query Sybase IQ server for top performance. In particular, for Big Datadatabases using an RJDBC interface. Sybase IQ offers Java and C++ APIs, withFurthermore, you can execute R libraries these APIs you can create User Defined Sybase is on a mission to revolutionize Bigfrom Sybase IQ as a function call within Functions (UDFs) that are called through Data Analytics with Sybase IQ. With ourSQL queries and return result sets. SQL queries. The UDFs can access all of centralized data analysis delivering insights the data within a Sybase IQ database and across your Enterprise, and our supportSybase IQ also offers in-database can leverage a PlexQ™ grid for massively of large user communities running a wideexecution of Predictive Model Markup parallel processing. Sybase IQ also offers range of analytics workloads – allowingLanguage (PMML) models through a an In-database analytics simulator, which organizations to analyze hundreds ofcertified plug-in from Zementis. This allows you to test a custom built UDF terabytes, even petabytes of data inallows you to automate the execution before deploying it into a production speeds up to 100 times faster – you canof analytic models defined using industry database. see that the big data challenges introducedstandard language and that are created inSAS, SPSS Clementine, and other popular at the beginning of this article – volume, As you can see in-database analytics is a velocity, variety, costs and skills, arepredictive workbench products. By using key component to Sybase IQ’s success in matched with the growing set featuresindustry standard languages it enables being an advanced analytics platform for and capabilities offered by SAP Sybase to leverage your existing investments Big Data. Data volume, accuracy, and swift Now with accurate complete informationwhile providing better performance and processing time are all factors critical for across your enterprise, Big Data doesn’tscalability. success but the balancing act between seem like such a Big Problem it has turned these key components continues to pose into a Big Advantage – with SAP SybaseWithin Sybase IQ, the row store SQL serious challenges for most organizations. IQ!Anywhere engine, allows you to also With traditional analytics this datacreate indexes of geospatial information Source: Sybase4
  5. 5. Gartner Reserch: Magic Quadrant for Data WarehouseDatabase Management SystemsThe data warehouse DBMS market used as a data warehouse – rather, a data data (SSED), excluding all data warehouseis undergoing a transformation with warehouse (solution/data architecture) design-specific structures (such as indexes,the introduction of “big data” and the is deployed on a DBMS platform. A data cubes, stars and summary tables). SSEDlogical data warehouse demand for new warehouse solution architecture can is the actual row/byte count of datatechniques in practices and technology. and often does, use many different data extracted from all sources.The integration of professional services constructs and repositories. Importantly, From 2012 onwards, defining the size ofwith product offerings also increased in the definition of this market is changing a warehouse will become less importantimportance in 2011. and a DBMS will become only part of the and information asset access will become overall market definition as the logical more important. Within SSED it isMarket Definition/Description data warehouse (LDW) continues to important to separate the actual data sizeThis document was revised on 05 March grow in acceptance and deployment. in a data warehouse from the database2012. The document you are viewing total size. Gartner clients report thatis the corrected version. For more A data warehouse is a database in which many 100-terabyte warehouses ofteninformation, see the Corrections page on two or more disparate data sources can hold less than 30 terabytes of actual be brought together in an integrated, Throughout 2012 and 2013, the size of a time-variant information management warehouse will evolve toward a combinedThe supplier side of the data warehouse strategy. Its logical design includes the metric, relative to the repositories underdatabase management system (DBMS) flexibility to introduce additional disparate direct management of the warehouse andmarket consists of those vendors data without significant modification complemented by the volume of availablesupplying DBMS products for the database of any existing entity design. A data information accessed by the warehouse,infrastructure of a data warehouse and warehouse DBMS is now expected to as well as its performance in doing so (seethe required operational management coordinate virtualization strategies, as Note 3).controls. well as distributed and/or processing approaches such as MapReduce, to In addition, for the purposes of thisFor the purposes of this Magic Quadrant handle one aspect of big or extreme data analysis, we treat all of a vendor’sanalysis, a DBMS is defined as a complete situations. products as a set. If a vendor marketssoftware system that supports and more than one DBMS that can be usedmanages a logical database or databases A data warehouse can be of any size. The as a data warehouse DBMS, we notein storage. Data warehouse DBMSs are sizing definitions of traditional warehouses this fact in the section related to thesystems that, in addition to supporting remain as: specific vendor, but evaluate its productsthe relational data model (extended to • Small data warehouses are less than together as a single entity. Further, asupport new structures and data types 5 TB. DBMS product must be part of a vendor’ssuch as materialized views, XML and product set for the majority of the • Midsize data warehouses are 5 TBmetadata-enabled access to content), calendar year in question. If a product to 20 data availability to independent or vendor is acquired mid-year, it will befront-end application software and • Large data warehouse are greater labeled appropriately but placed separatelyinclude mechanisms to isolate workload than 20 TB on the Magic Quadrant until the followingrequirements (see Note 2) and control year (see Figure 1).various parameters of end-user access Importantly, none of these categorieswithin a single instance of the data. qualify a warehouse as a “big data” There are many different delivery models, warehouse. Volume alone is not “Big such as stand-alone DBMS software,This market is specific to DBMSs used data.” For the purpose of measuring the certified configurations, data warehouseas a platform for a data warehouse. It is size of a data warehouse database, we appliances (see Note 1) and cloud (publicimportant to note that a DBMS cannot be define data as source-system-extracted and private) offerings. These are also evaluated together within the analysis of each vendor. 5
  6. 6. Figure 1. Magic Quadrant for Data Warehouse Database Management Systems is either a visionary with cloud and data warehouse as a service, but does not execute against the challengers leaders rest of the market, or it is good at execution against two of the many use cases in the market with little Teradata vision for the remainder. Oracle The 1010data position is almost IBM perpendicular to our combined EMC/Greenplum evaluation criteria. Therefore, we ability to execute Sybase, an SAP Company have placed it with high execution 1010data against a sub-section of the market Microsoft we evaluate. From a visionary ParAccel Vertica perspective, 1010data is difficult Kognitio to evaluate under current criteria. SAND Technology Its approach in using a cloud- Infobright based and “as a service” DBMS/ analytics solution is the primary Actian business model and technology approach. Cloud-based analytics as a service and the ability to deliver Exasol under a managed on-premises niche players visionaries model, leaves 1010data short of the much broader vision desired completeness of vision by the greatest portion of the As of February 2012 data warehouse market, but in these few delivery segments of the Source: Gartner (February 2012) market 1010data is a formidable performance competitor. • 1010data is expected to addMagic Quadrant share large amounts of data without probabilistic matching in 2012. needing to manage it locally – for The company has exhibitedVendor Strengths and Cautions example, large quantities of CPG significantly more reduced load1010data times than some of its significant data can be shared by multiple retail1010data ( was big data competitors, as well as companies.established 11 years ago as a managed orders of magnitude and fasterservice data warehouse provider with an As a managed service solution performance in extremely largeintegrated DBMS and business intelligence vendor, 1010data can complement datasets. 1010data products read(BI) solution primarily for the financial the customer’s internal IT SQL, but also utilize their own,sector and more recently, the retail/ department with fast-to-market non-SQL language that performsconsumer packaged goods (CPG) sector. solutions for business units, so high-speed joins with unplanned1010data can host its solution using reducing resource consumption data rationalization built into thetraditional software as a service (SaaS) within the IT department. More queries without the performancemodel or support a managed solution importantly, the managed service disadvantages of using interimat the customer’s site. 1010data has model enables 1010data to leverage return datasets.approximately 200 customers. software solutions across multiple customers. As new applications are • Perhaps the most importantStrengths created, they become available to point raised by those customers all clients, increasing the availability referenced is that 1010data is • Since 1010data offers a complete utilized by both IT and the business of these applications to businesses. SaaS solution, the customer’s with fast response times on queries With more than 200 customers, business unit and IT organization running against hundreds of billions 1010data has reached a position to need little experience of data of row tables (with a combined break out of its former niche status. warehousing or BI. The SaaS model number of rows throughout The problem is that the company also allows multiple organizations to6
  7. 7. databases exceeding a trillion rows As the demand for hybrid analytics Actian in the entire database in some mixing structured data with content Actian ( offers two instances). The company also increases, 1010data will need to products, the general-purpose Ingres serves as a data aggregator and data introduce unstructured data analysis DBMS and Vectorwise, a new offering marketplace providing datasets for as well as operational technology introduced in June 2010 and targeted at rapid enhancement and enrichment or machine-generated data analysis. analytic data warehouses. Open-source of analytics normally bound to 1010data’s competitors have greater Ingres, one of the original RDBMS internal datasets only. financial resources and already are engines, has a 30-year history and claims Our reference checks and in the process of building out this more than 10,000 customers running discussions with Gartner clients part of the data warehouse vision. mission-critical applications, including data also show that 1010data is • One of 1010data’s strengths warehouses. price-competitive with non-SaaS also acts as a caution. While the alternatives, especially by reducing business prefers a solution that is a Strengths the management overheads needed complete, deployment-ready stack, • The Actian database contains most to support a data warehouse IT departments and purchasing of the features necessary for data environment. 1010data has offices do not. 1010data’s offering is warehousing, such as partitioning, expanded from the financial sector sold as a fully integrated DBMS and compression, parallel querying (where it began) into a broader BI solution, which limits potential and multidimensional structures. market, including the retail sector. customers to those wanting a Release 10 added bulk load, scalar 1010data now claims more than full solution (primarily because subqueries, long identifiers and 200 customers and its customer of 1010data’s pricing model). a geospatial offering that was references support our belief that 1010data’s product is a compliant, community driven with hundreds it is one of the stronger small relational DBMS (RDBMS) that of committers contributing code. data warehouse DBMS vendors. customers can use as a stand-alone The performance of Vectorwise, In addition, the company has a system if desired – but fees are especially in analytic applications, small number of customers that charged as if the entire solution is was cited by customers interviewed install its system on-premises as managed. Customers are advised to by Gartner. With the emergence a managed solution, with several check the total cost of ownership of new server platforms with using 1010data as an enterprise in such cases, as it may not be storage-class memory (of 1 TB and data warehouse solution vendor. advantageous to use 1010data in more), Vectorwise will prove a Therefore, from an execution this way. valuable asset for data warehousing standpoint, 1010data matches • As a solution vendor, 1010data and analytics as more of the data performance, pricing and delivery has a different competitive warehouse moves to memory. model for two specific needs in model from vendors of pure-play • Actian has aggressively pursued the market quite well and it is DBMS offerings. In addition to partners, including independent expanding both its scope of delivery competing in the data warehouse software vendors (ISVs) in the BI and its vertical customer base. DBMS market, it competes with market, the primary driver of new system integration vendors that installations in data warehousing.Cautions offer outsourced solutions, such Both new and existing customers • The market continues to resist as Cognizant and HP (via EDS). are looking for an open-source fully-managed data warehouse Additionally, IBM, Oracle and other BI stack with partners such as services in many verticals and large vendors with professional Jaspersoft and commercial BI horizontal use cases. 1010data is service organizations compete with vendors such as MicroStrategy susceptible to resistance from IT 1010data in two markets, data have also engaged with Actian. departments requiring all its data warehouse DBMSs and services. It Ingres and Vectorwise are gaining warehouses to be located in-house, remains to be seen if this is a bias attention from vertical application along with in-house governance to be overcome or if the cloud vendors, system integrators and of the organization’s data assets. and on-premises mix will ultimately resellers. Vectorwise uses some The IT market is not fickle and exclude a vendor like 1010data. Ingres software atop a column store persists in its use of better name- However, based on its extremely from the MonetDB project and uses branded vendors and not simply positive customer references, it hardware assists, turning columns because they are name-branded. is very unlikely 1010data will be into vectors and processing them excluded from such a mix. in x86 chip registers to leverage 7
  8. 8. instruction parallelism and on-chip • Actian offers professional services Strengths caching. Vectorwise has delivered in data warehousing and has a go- • Greenplum’s understanding and several top non-clustered TPC-H to-market strategy with a growing vision of the data warehouse benchmark results at 1 TB and stable of partners – it claims half market was ahead of the market as below. The company was renamed of its 2011 Vectorwise sales have it was one of the first to work with in late 2011 and introduced another come though channels. However, MapReduce, manage external files new product offering, the Cloud it lacks data models and must from within the DBMS and optimize Action Platform, to support the continue to add marketing and sales for very large database sizes. As delivery of “Action Apps” that expertise for data warehousing. big data is now important in the will act on the analytic capabilities Additionally, Actian has strength market and the LDW is emerging as Actian supports. in open-source, but the overall a necessary functionality to support • Previous reference checks have adoption of open-source for data today’s mix of volume, velocity, shown Ingres customers to be very warehousing remains weak. While variety and complexity, Greenplum loyal. Most have online transaction Actian has professional services, it has a base to support this that was processing (OLTP) applications, tends to lack some of the tools and launched several years ago, which but Ingres has also been used methodology support that other translates into the high ability to for smaller data warehouses organizations have readily available. execute. (historically up to about 2 TB, the • Actian’s new brand and name, as Greenplum announced the first company is targeting warehouses well as its portfolio expansions, can unified analytics appliance addressing smaller than 10 TB). Among open- help overcome Ingres’s reputation big data (a modular solution for source DBMS, only Oracle’s MySQL as an older product that has not structured and unstructured data), compares with proven maturity regained much market traction. in May 2011 that was released for mission-critical applications, Importantly, Actian has taken a bold in September 2011. The EMC including data warehousing. stance in attempting to re-establish Greenplum Data Computing Vectorwise has begun to gain new itself with a new vision and new Appliance (DCA) uses the customers and software partners, plans for execution. Initial response Greenplum Database, Greenplum targeting another set of use cases. to Vectorwise is significant with the HD (Hadoop), and Greenplum Now in its version 2.0, it has added addition of more than 20 customers Data Integration Accelerator (DIA) Windows as a platform and has a in its first year offering and users modules that can be configured clear road map for several future should consider Actian’s Vectorwise within one single appliance cluster. releases. to be a new and innovative In addition, Greenplum has Chorus, solution in that respect. However, its analytics productivity software,Cautions market perception is difficult to leveraging VMware’s technology, to change. Both offerings have gained support automated, self-service data • Although Vectorwise enhances new customers and third-party services and collaborative analytics. Actian’s ability to support analytic relationships, but to become a In a recent announcement, EMC data marts, the company must serious competitor in this market announced the first Hadoop NAS continue to address enhanced data Actian must continue to show attached HDFS system – HDFS warehouse functionality, storage increased growth in both revenue running native on EMC Isilon management and mixed workload and numbers of new customers at connected to the Greenplum HD management if it is to compete a higher rate than it has thus far. or Greenplum Data Computing with larger, equally mature vendors Effective marketing execution is a Appliance (DCA). Finally, through and meet the needs of the broader must-have for Actian to compete. the external file mechanisms and data warehouse DBMS market. Vectorwise needs to support more user defined functions (UDF), analytic SQL constructs than it does EMC/Greenplum Greenplum has started along the now and add stored procedures Greenplum ( is part path to support LDW. Greenplum and user-defined functions and of the Data Products division of EMC even supports an iOS, Linux and data types to move closer to with a massively parallel processing (MPP) Windows single-user development competitors. Its new product and data warehouse DBMS running on Linux system downloadable as free (not restructuring around Action Apps and Unix. It can be sold as an appliance or open-source) software. can be synergistic – but could also as a stand-alone DBMS and has more than • As Greenplum has settled into prove distracting. 400 customers worldwide. the EMC organization, we have8
  9. 9. seen an increase in hiring directly presence to compete with all the Exasol related to development. This, incumbent, large DBMS vendors. Exasol ( is a small DBMS coupled with the EMC development Importantly, EMC’s customer base vendor in Nuremberg, Germany. Exasol organization has led Greenplum is primarily within the IT unit of has been in business since 2000 with the to offer its DCA supporting big the organization. Data warehousing first in-memory column-store DBMS, data for both structured and is the technical infrastructure for EXASolution, available since 2004 and unstructured data and intergraded an intensely business-oriented primarily used as a data mart for analytic MapReduce processing. The DCA use-case. EMC will need to learn applications. is now assembled by EMC and sold from its Greenplum acquired by its sales force. In an interesting knowledgebase, specifically how to Strengths manufacturing cost management solution sell a data warehouse and • Exasol offers an in-memory column- model, EMC is assembling its analytics solution. store DBMS for data warehousing. appliances in different countries • Interestingly, this year our customer As we have stated, this technology around the world, affording EMC references have raised several is one of the critical capabilities of Greenplum a tax advantage in many issues around support. In these the future for the data warehouse countries where others (such as cases it was not related to the DBMS market. Exasol runs in a Oracle and Teradata) are subject attention to rapid support and clustered environment offering to stiff import duties. This positions fixes (with all customers stating scalability across multiple servers. the company for easier entry fixes were available in an expected, Not only does this allow for high- into global markets. Due to the timely manner), but more with availability in the case of a server acquisition, Greenplum has been the bugs in the first place. We failure using EXACluster OS, but able to work more closely with would classify these as “growing also scaling for larger memory sizes. VMware, for example rearchitecting pains” especially for a small EXASolution maintains redundant the Chorus private cloud offering. organization (as Greenplum was copies of the data in memory to • Our customer references support pre-acquisition) being integrated reduce the downtime associated the claims of high performance into a large organization such as with server failures. as well as advantageous price/ EMC. We should also note that in Exasol also includes the use of disk performance ratios. These our inquiries with Gartner clients, for persistence and overflow (if all references also support the we have seen this issue diminish, the data does not fit in memory). Greenplum claim of scalability to coupled with consistently high However, when data is loaded into very large database sizes. Reported marks for personalized customer Exasol, it is loaded into memory sizes range from 10 terabytes to support. first and then written to the disk, more than 500 terabytes. When • As Greenplum leverages EMC allowing for the applications to this combination of performance more, it will find itself competing begin before the slower activity and scalability are joined to an at a higher level with the mature, of disk input/output (I/O) is appliance, the potential of EMC/ incumbent vendors. The major completed. This separation of the Greenplum to compete in the data vendors (such as IBM, Oracle, SAP data access and data persistence warehouse market is increased. and Teradata), have a much larger model is a visionary change for the customer base allowing them, as market. Additionally, as a column-Cautions the incumbent, a stronger position. store, Exasol has excellent data • Although acquired by EMC 18 EMC/Greenplum must continue compression (reported to be on months ago and despite doubling to demonstrate differentiation as average, four times faster), thus the install-base, Greenplum’s it addresses the data warehouse reducing the amount of memory market position is sixth or seventh market and big data is one specific necessary. EXASolution is sold by worldwide. To really increase area, as is cloud. The company must the amount of memory used for the velocity and gain market share, continue to support customers data. Greenplum must continue to accustomed to the type of service • Another advantage of Exasol, as develop the EMC sales force so provided by a small company with other in-memory DBMSs, is that it has the necessary skills with focused, customer-specific the high speed of the DBMS. In in the DBMS software market. professional services solutions, published benchmarks, Exasol has Greenplum must also continue issue-focused support and leveraging attained data warehouse transaction to leverage the EMC worldwide key customer inputs for product speeds up to 20 times the closest enhancements. 9
  10. 10. competitor. Server memory Exasol lacked a marketing vision vendors such as Quest are less is expensive, but these same to grow beyond the borders of its likely to support the DBMS, benchmarks demonstrated costs European base. The company began requiring Exasol to create their own of approximately one-third of the an expansion plan in 2011 and management software. standard DBMS. Our reference will begin to grow offices in other checks also validate the claims of locations, including North America. IBM cost reduction and speed. Another • Another issue is the increasing IBM ( offers stand- strength of the in-memory nature competition, both in column-store alone DBMS solutions as well as data of Exasol is removing the necessity and in-memory. Exasol has a clear warehouse appliances, currently marketed of optimization and calculation advantage being the first with an as the IBM Smart Analytics System family structures within the database. in-memory column-store DBMS. (ISAS) and the Netezza brand. IBM’s There is no need to build Now, most of the DBMS vendors data warehouse software, InfoSphere summaries, aggregates and cubes offer some form of column-store Warehouse, is available on Unix, Linux, for use in business intelligence capabilities. Further, when Exasol Windows and z/OS. IBM has also and analytics. This reduces the began, there were only a handful of continued research and development and overhead in the DBMS by as much in-memory DBMS, mostly used for market execution for the Netezza brand as 10 times, as well as reducing streaming data applications. There and product line following its acquisition. the database administrator (DBA) are now many in-memory DBMSs IBM has thousands of database customers resources used to maintain such available in both the column and worldwide and more than 500 appliance structures. In addition, this also row-store variety. Finally, SAP has customers (Netezza and ISAS combined). leads to very fast load times, released its SAP HANA appliance as there are no complicated with an in-memory column-store Strengths structures to build during loading. DBMS for an analytics data mart • The breadth of IBM technology • Customer references clearly and now available under the SAP offerings is complementary to espouse the abilities of NetWeaver Business Warehouse. and part of its solution delivery EXASolution for both pure As with many technologies, capability. InfoSphere Warehouse, performance and cost/performance. being first is not sufficient unless a data warehouse offering based The references (although few in capitalized in growth of market on IBM DB2, is a software-only number) also state that customer share. Exasol has missed the solution. IBM’s data warehouse support is excellent. Finally, window of opportunity of being appliance solution, the IBM references corroborate the results first and now faces increased Smart Analytics System (ISAS) is of the benchmarks mentioned competition. a combined server and storage here, with better than 20 times • Customer references report that hardware solution (using the IBM performance at half to a third there is one major issue with the Power Systems server with AIX, of the cost. They also support use of EXASolution – the lack of the System x server with Linux or the claims of 4 times (or more) interfaces to common BI tools. Windows and the IBM InfoSphere compression. Exasol offers the standard ODBC Warehouse and a robust System and JDBC interfaces, but this can z ISAS data warehouse solution),Cautions be a performance drawback with complete with service and support. tools such as BusinessObjects, • The primary challenge Exasol faces IBM’s introduction of InfoSphere Cognos and SAS. As Exasol has a is the small size of the company and BigInsights includes offerings to aid small installed base, it is difficult to previous lack of expansion beyond the design, installation, integration engage the tools vendors to assist Germany. Exasol was primarily and monitoring of the use of in creating native interfaces to the engaged in product development Hadoop technologies within an DBMS. We do expect to see this for its first five years of operations IBM-supported environment. In remedied over the next few years and with changes in management IBM’s case, it is important to note as the size of the installed base two years ago has now obtained that it has embraced the vision grows. Similarly, there is a reported the vast majority of its 30 or more for the LDW – which Gartner lack of software to manage the customer base in the past two describes as the emerging new best Exasol environment (EXASolution). years. These customers are mostly practices in analytics management. Again, with a small installed base, located in Germany, with several in By tying together relational data, third-party management software Italy and Japan. Until very recently, data streams and Hadoop files,10
  11. 11. IBM’s stack builds confidence among IBM specifically assigns technical own methodology and highlights managers of existing warehouse account managers to support that the traditional enterprise data implementations that the product is accounts). Additionally, IBM’s focus warehouse [EDW] is vital to all data evolving as new demands for these on prospect qualification resulted in warehouse strategies including as a two components of the logical data a higher growth in 2011 vs. 2009 to base component for the LDW. warehouse emerge. 2010 for all of its products. This was IBM’s first incarnation of Additionally, for Smart • The overall effect is that referenced the LDW approach. The market Consolidation – rather than customers are confident regarding is acknowledging that the EDW developing tooling in isolation, IBM release dates and the road map. does not have to be the center of focused on tooling that existed in Customers list concurrency, the strategy but will be significant. its Information Integration portfolio scalability, performance optimization However, the justification for (InfoSphere BluePrint Director). and support as positives and were the LDW and evolving existing This resulted in improvements in the most often repeated phrases warehouses or replacing them the area of integration, including but in the reference survey in 2011. will be difficult at first because not limited to the common Data References elaborated by indicating it appears to supporters of Warehouse Packs and Models now that partitioning, compression and traditional data warehouses to supported on DB2 and Netezza reduced administrative hours all be a radical departure from their platforms alike. contribute to their experience to beloved traditional data warehouse• IBM combines product sales with support optimized performance. practices. Gartner’s own research solution services. This market At the same time, some references indicates that the LDW approach is demands a widely varied level reported that optimization of quickly emerging as the newest data of sophistication and knowledge queries should be targeted rather warehouse best practice. Gartner depending on each client than being forced to optimize every anticipates the LDW will become organization’s maturity in analytics single query because the system is a best practices approach during and information management. As able to engage a solid query plan for 2013-2015. With market leadership noted in the overview, the data execution. This evaluation considers there is risk commensurate with the warehouse market in 2011 has the LDW concept to be innovative, anticipated rewards. IBM will need multiple visions for the future. but has yet to see a wider embrace to continue their careful education IBM has embraced the logical in the market. IBM’s early adoption message regarding their leadership data warehouse (via “Smart of the LDW concept in both its approach in LDW practices. When Consolidation”) approach while messaging and its product road engaging in an LDW approach continuing to advance its technology map has established this vendor as with IBM, clients should insure solutions and implementation an early resource for the market. they completely understand IBM’s practices supporting traditional data However, the majority of the positioning for implementing this warehousing architectures. market for data warehousing will solution. Professional services available remain significantly focused on • Gartner inquiries report indicate from IBM range from expert traditional solutions for a minimum that IBM data warehouse solutions education through turnkey of the next three years. are also marketed and delivered in solutions to managed services for isolation from each other. There are data warehousing. Importantly, Cautions strategic reasons to continue such where IBM leverages its services an approach with any acquisition, • IBM has embraced the logical data organization most, is in feeding but Netezza products tend to have warehouse vision as the likely field experiences into the overall their own niche in customers’ minds successor to current best practices data warehouse vision. In 2010, that is viewed as being separate and in traditional data warehousing. The clients reported that IBM’s support distinct from IBM (but Netezza’s market has not yet determined if appears disconnected from its growth was more than 30% in 2011, it is ready to adopt this approach product strategy – this improved in which is faster than its previous as the new vision for the data 2011 with an even larger reference growth rate as an independent warehouse and abandon 20 years base reporting. This does not mean company). of traditional best practices. the issue has been resolved, but it IBM’s professional services have As a result, IBM customers often appears that IBM’s focus on solution experience in delivering various engage only part of the organization services is paying off (for example, aspects of the LDW under its for solutions and at least in the 11
  12. 12. customer’s minds, eliminate the compressed DBMS. The company Infobright also released an option others. This creates both marketing provides both an open-source version for the Enterprise Edition called the and sales process challenges. This (Infobright Community Edition [ICE]) Distributed Load Processor (DLP) is not an issue with shortlisted and a commercial version (Infobright which allows for the parallel loading solutions (IBM should recommend Enterprise Edition [IEE]). Infobright has of data into the system at very high one solution or another), but does approximately 200 customers worldwide. speeds. Infobright has also added carry over into the solution delivery connectivity to Hadoop MapReduce team and IBM is missing some Strengths for the processing of “Big data.” opportunities for the different parts • Infobright remains one of the only This is extremely important to of the sales organization to leverage column-store DBMS in the open- the machine-generated data world each other. IBM has implemented source software environment. as much of this data is stored in organizational changes intended to Its revenue is generated from Hadoop or other such file systems address these issues. the Enterprise Edition (using a and needs to be extracted into a Netezza and IBM personnel do commercial license, rather than a DBMS for processing. interact and coordinate with General Public License [GPL]) with • Our customer references are clear each other behind the scenes. a subscription support model based on several points. Infobright is A marketing solution would on the amount of SSED stored in extremely fast compared to other simply begin branding software the system. As we stated in 2011, systems, including MySQL. Reports and hardware combinations for Infobright decided in mid-2010 to of up to an average 500% increase limited purposes. However, IBM focus on operational technology in performance over MySQL will choose the more difficult (and data (which it calls machine- deployments have been reported. more appropriate) solution of generated data). This encompasses We believe this is not only from creating an educational sales and data from sources such as smart the column-store design, but also implementation process which meter data (in the utilities space), the Knowledge Grid. References will demonstrate how software customer data records (in the telco suggest that Infobright is replacing and hardware capabilities can be space) and clickstream data from an existing MySQL environment leveraged effectively to support Internet interactions. with great gains in stability, each use case. This focus has helped Infobright compression and performance. • IBM customers report (via inquiry during 2011 where its customer Some cases report a year or more and reference survey results) base has grown to more than 200 without an outage. a scattering of intermittent direct and OEM channel customers. Finally, many references state that and irregular issues with Not only has this focus increased simplicity is a factor in their choice product performance or their customers, but has also attracted to use Infobright. We also believe implementation experience. Some a number of additional OEMs this will interest OEMs that want to of these are possibly attributed to (now accounting for approximately build-in Infobright to their existing the implementation process and 40% of customers). This, along systems for resale. The simplicity not the products. However, these with partnerships with Pentaho, of management, scalability and same customers report that IBM Jaspersoft, Talend and others, will compression all interest the OEM support addresses these issues with help the company grow substantially looking for a DBMS to embed that efficiency. Nonetheless, as with faster than direct sales only. requires little support on their part. any IT products, an assumption • Infobright has several unique The focus on machine-generated that appliances or certified technologies in the DBMS. In data has been important to configurations alleviate all issues is addition to the column-store file Infobright, but we believe that the incorrect. Most issues are irregular system for MySQL, the Knowledge future will greatly depend on the in nature and IBM support is Grid in-memory metadata store company’s ability to leverage these intimately involved in the resolution is a major differentiator for OEM partners. process. Infobright, as this product analyzes queries to minimize the number CautionsInfobright of “data packs” that have to be • One of the biggest challenges forInfobright ( has decompressed to give a result (data a small vendor is to focus on whatoffices in Canada, Europe and the packs are the compressed domains/ they do well. Infobright has doneU.S. and offers a combination of a regions of data in Infobright’s this with machine-generated data.column-vectored DBMS and a fully offering).12
  13. 13. However, as a small, relatively MySQL. To date, Oracle has not started to produce results, with young vendor, Infobright must done anything other than enhance several new customers. Kognitio continue to differentiate its the product. However, in the future has also added several hosting offerings and open-source model when the contract is done with EU, partners in the U.S. and the U.K. from mature column-store DBMSs. we cannot guarantee that Oracle offering managed services on WX2. Sometimes, these two statements will not change the agreements, Its sales model as dbSaaS makes up are contradictory not least because especially those with OEMs. This almost half of its revenue and has the focus on machine-generated is an issue customers of Infobright supported much of the company’s data cannot be an excuse for should monitor in the future. growth this year. ignoring its existing customers • Kognitio continues to invest in addressing other data management in-memory capabilities. Gartner Kognitio use cases, reported in several considers that in-memory DBMSs Kognitio ( started by customer references as an issue. An can play a major role in enterprises offering data warehouse appliances and example is workload management information infrastructure and as warehousing as a hosted service. Today, software, where the managed such Kognitio’s technology has it has a mixture of less than 50 customers workloads are basically for machine- an opportunity to meet customer using its DBMS (WX2) separately as an generated data and may lack the demand, given the maturity of its appliance, a data warehouse DBMS engine, robustness needed for management offering, compared to other more or data warehousing as a managed service of overall workload. recent offerings. Kognitio’s DBMS, (hosted on hardware located at Kognitio’s• There are other issues raised by sites or those of its partners). WX2 version 7, already includes our reference checks. As with most in-memory analytics, and customer small startup vendors, stability from Strengths references continue to report one release to another can suffer. that the speed of query and load • Kognitio pioneered the data Customer references reveal that performance is excellent. In 2011, warehousing database as a service there have been issues with new Kognitio added Pablo in-memory (dbSaaS) model, where a data releases, but they are quick to point online analytical processing (OLAP) warehouse DBMS is delivered out that the problems are quickly capabilities to further strengthen its as a managed service from the resolved. The lack of management analytical capabilities The DBMS is DBMS vendor. Clients buy data software (also an issue for smaller already an in-memory DBMS, with warehousing services from Kognitio, vendors) was raised. Third-party hot data held in-memory and cold while Kognitio hosts the database. software vendors are not quick data on disk, managed automatically Data warehousing dbSaaS permits to pick up new, young software by the DBMS. clients to expand their warehouses companies, as the potential market • Those customers referenced incrementally and clients note is small, so this puts more pressure reported significant concurrency that this model provides for low on Infobright to produce its own capabilities, as well as excellent upfront costs with virtually no management software. support and product management. capital expenditure required to• Finally, Infobright is open-source get started. This is a growing Kognitio is gaining visibility thanks and makes use of portions of segment of the data warehouse to the current market interest in MySQL, under a Commercial OEM DBMS market. Kognitio also works in-memory technologies. Kognitio’s License with Oracle. We always with deployment partners such customers report that deployment question the open-source model as Capgemini (and contributes of large-scale data warehouse for revenue generation. First, to Capgemini’s Immediate cloud efforts takes as little as 10 weeks Infobright has a community version computing offering). using this model. References also with less functionality than the report predictable, linear scaling of Additionally, in line with existing Enterprise Edition. This has proven performance and under the “as a market demands, Kognitio has useful as a trial system to attract service” model, customers report an appliance to install on-site for new customers, but some may opt scale up and scale down needs as customers requiring their own for the ICE version in lieu of the part of a solid account management infrastructures. Kognitio opened Enterprise Edition. approach. Finally and possibly most offices in the U.S. three years ago The other issue is specifically the importantly, references indicate that in addition to its U.K. headquarters use of MySQL, as it is owned by new queries and new variations on and has continued to expand its Oracle. This implies risks remain existing analytics can be deployed presence in the U.S. by hiring due to the uncertain future of rapidly. additional resources. This has 13
  14. 14. Cautions such as those of IBM (Cognos) can also leverage SharePoint and • Kognitio has a very substantial and SAP (BusinessObjects), is PowerPivot and the ability to opportunity in the small or midsize difficult to manage. This problem include an unstructured information business data warehouse and is compounded by Kognitio’s type in analytics is the result of BI market thanks to its dbSaaS small market penetration and the its technology blend and this is a model. However, over the past resulting scarcity of tool expertise strength that should definitely not year, managed services offerings in the market. References also be ignored. from IBM and HP/Vertica have report the absence of any form of • References report that Microsoft experienced growing acceptance developers’ forum or marketplace, exhibits one of the best value and penetration in the market. scarcity of skills in the market and propositions on the market with These offerings are not direct an extremely lean global presence a low cost and a highly favorable competitors to Kognitio’s solution, makes commitment to the product price/performance ratio. Skills are but the customer base views them and consistent delivery difficult. widely available in the marketplace as an equal alternative from more to operate a Microsoft data established vendors. Microsoft warehouse and there is an easy Kognitio has not yet addressed Microsoft ( continues learning curve to acquire those some of the very large volume to market its SQL Server 2008 DBMS same skills, as needed. As an added or variety of data support issues (Release 2) Business Data Warehouse bonus, customers report that the – more specifically support for and Fast Track Data Warehouse for data integration and continuity of a content and complexity aspects of warehousing customers not requiring an complete Microsoft data warehouse extreme information. However, MPP DBMS. Microsoft released its own and business intelligence stack is Kognitio’s in-memory analytical MPP data warehouse appliance, the SQL highly advantageous to time-to-value capabilities can be of value in low Server 2008 R2 Parallel Data Warehouse in delivery. Noticeably absent are latency, high volume analytics. (Microsoft) (PDW), in November 2010. any fears regarding vendor lock-in. The market shifted dramatically Strengths According to our reference checks during 2011 toward a new position. and discussions with our clients, • Microsoft spent 2011 revitalizing Kognitio did not stand still, but worldwide support from Microsoft its vision for the data warehouse market demand regarding new is extensive, encompassing partners, market. Additionally, it announced functionality expanded more rapidly value-added re-sellers, vendors of two Apache/Hadoop connectors than Kognitio’s product feature third-party software and tools and for SQL Server, SMP and Parallel sets. This appears to only be a widely available SQL Server skills. Data Warehouse (PDW) in temporary condition while Kognitio support of the market’s big data • Microsoft references indicate a addresses these new expectations. issues. Many would be surprised dominant presence in midsize data • While Kognitio continues to grow to learn that Microsoft already warehouses —especially those its installed base (with an additional provided combined structured end-user organizations reporting seven clients in 2011) the company and unstructured analysis in SQL that their companies and their data remains a small vendor with fewer Server 2008/R2. A third quarter management needs are growing. than 50 customers worldwide. appliance update included support According to customer references, This makes it increasingly difficult and enhancements for integration Microsoft assures its customers of to sell to organizations that have with SAP/Business Objects, a solid data warehouse platform incumbent vendors, and to compete MicroStrategy and Informatica. including features and functions with some of the lower-priced that run the gamut of traditional In addition, Microsoft offers the appliance offerings. Additionally, warehouse functionality. SQL Server Fast Track Data as a data warehouse outsourcing Warehouse, which includes For connectivity in a multi- solution, organizations should be validated reference architectures vendor environment Microsoft aware that they are still responsible for building a balanced data offers a SAP/BW, Teradata and for contracting and auditing data warehouse infrastructure. This Oracle connector. The DBMS security procedures. road map contributes significantly supports compression and • Clients report interoperability to the company’s vision for the backup compression, partitioned with third-party popular BI tools, market and its customers. Microsoft table parallelism, policy-based14