Uploaded on

We wanted to know how companies viewed the changing data warehousing landscape, so we surveyed 200 businesses to learn more about the issues they faced. In "Delivering the Best of All Worlds for …

We wanted to know how companies viewed the changing data warehousing landscape, so we surveyed 200 businesses to learn more about the issues they faced. In "Delivering the Best of All Worlds for Today's Analytics" we compare the technology, present the options, and provide findings from our survey. We also discuss the latest column store techniques and open source technology to provide both enterprise class performance and affordability.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
499
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Kickfire and MySQL: Delivering the Best of All Worlds for Today's Analytics A Kickfire White Paper
  • 2. What You’ll Learn This white paper examines the issues facing companies that are evaluating their data warehousing and analytics architectures and options. It draws on analyst and market research including a survey of 200 enterprises on their attitudes towards, and concerns and issues about the adoption of newer data warehous- ing technologies such as open source databases and appliances. The Data Warehousing Landscape Today’s data growth is nothing short of explosive. Data comes from all directions, all devices and in more volume than ever before. Organizations are drowning in it but ironically they still don’t have the information they need to meet all their business goals. Just as data is proliferating faster than ever, so is user demand for access to that data. Internal users, empowered by powerful, easy-to-use PC and browser-based tools, are becoming increasingly savvy and they are frustrated by IT’s inability to keep pace with their analytics needs. And data consumers are no longer confined to internal users. Today’s customers and business partners are requiring secure query portals through which to view and analyze their business. In this context, it should come as no surprise that a 2007 Gartner Group survey1 found CIO’s identified Business Intelligence (BI) as their number one technology priority, up from number ten just four years ago. The good – and bad – news is that there are more options than ever before to help address these issues. What Are Your Data Warehousing Options? Option 1: Software Options: Traditional, Open Source and Analytic Databases Traditional Databases The issues associated with this option are known only too well, so we will not dwell on them in this white paper. The traditional database vendors, in cooperation with their hardware partners, can provide the performance, scalability and support that are required but at a premium price that is often out of alignment with the business benefits, especially on smaller projects. In addition to the direct and indirect costs associated with these solutions, there is the issue of vendor lock-in. The deeper enterprises get into the specific feature sets of these databases and the more reliant they become on the IT teams and partner resources that are tightly aligned with the vendors of these databases, the more difficult it becomes to migrate to lower-cost, more easily implemented alternatives. That said solutions from traditional database vendors remain the most commonly deployed data warehousing solutions for most enterprises. A poll conducted by Kickfire at a recent TDWI event suggests that almost 60% of enterprises still rely on either IBM or Oracle-based data warehousing solutions. However, the ascendancy of these vendors and their solutions is being increasingly challenged by new database and appliance options as we will discuss in the following sections of this white paper. Page 2
  • 3. Open Source Databases Many enterprises view open source databases as very attractive options to traditional databases from an initial and total cost-of-ownership perspective. In fact, an April 2008 Kickfire survey of 200 such companies indicates that 66% want to be able to use open source databases, specifically MySQL, for data warehous- ing. However, many of these enterprises expressed concerns about the open source database’s ability to support their data warehousing needs. The most common of these concerns were: Figure 1 MySQL Data Warehousing Issues and Concerns 100 90 80 70 60 % Users 50 40 30 20 10 0 Performance - reports / queries too slow Scalability - won’t scale beyond 100 GB Functionality - doesn’t support ad hoc queries Tuning - will need constant tuning Hardware Build-out - scaling only possible by adding servers Although these concerns are valid, open source databases have come a long way in a very short period of time. The ability to enhance solutions quickly, supported by the huge open source ecosystem, is one of their most attractive benefits. There are thousands of open source developers around the world ranging from the IT teams in global enterprises, to developers working for software vendors to the archetypal lone-wolf open source experts. Between them, they have contributed millions of man hours to the develop- ment of these systems – far more development time than even the largest database vendor can bring to their own system. Most analysts and industry watchers are united on three key issues regarding open source databases: • There are more than 11 million active implementations and over 50,000 downloads per day of MySQL2 alone, highlighting the fact that open source databases are here to stay. page 3
  • 4. • Open source databases are ready for the enterprise. As Noel Yuhanna, principal analyst at Forrester, stated in his July 2008 Market Update: Open Source Databases3. “Sun Microsystems’ acquisition of MySQL further validated the open source database market’s worthiness, and enterprises can now expect even more reliability and improved support in the coming years.” In an excerpt from the same report he adds, “Every enterprise should now consider open source databases as part of its overall DBMS strategy, as doing this will deliver cost savings, especially when supporting small to midsized applications.” This is not to say that open source databases will immediately replace traditional databases. The heavy financial, application and IT skills investment that most enterprises have in traditional databases ensure that they will continue to be part of the enterprise software landscape for the foreseeable future. However, we believe that where a feature/function fit can be assured between the application and an open source database, many enterprises will choose to deploy the lower cost, more rapidly implemented option. The future for most enterprises will not be choosing between traditional or open source databases, it will be developing a co-existence strategy between them and establishing guidelines for users on which database is most suitable for which type of applications and workloads. Analytics Databases Many of the brightest data warehousing designers have come to believe that traditional databases, no matter what hardware they run on and how well tuned, cannot keep pace with the enterprise’s demand for faster and cheaper analytics. To that end, several companies have launched database products that are wholly optimized for analytics applications. These products typically stand the traditional row-based data storage model on its head and drive data access more efficiently from the column rather than the row. The best of these column-based analytics databases deliver truly dazzling performance. Academic research, in the form of the Yale/MIT paper published in 2008 entitled Column-Stores vs. Row-Stores: How Different Are They Really? 4 has confirmed that, for analytics applications, it is extremely difficult for a row-based database to perform at the same level as a column store database. However, the greatest strength of this analytics-based approach is also its greatest weakness – analytics performance is optimized at the expense of transactional performance. For most enterprises, this means that the analytics database is a complementary option to their existing traditional databases not a potential replacement. Analytics databases are still expensive and can require many CPUs to build out sufficient parallelism to achieve their performance targets. Many of the enterprises that we have surveyed, are not ready or willing to take on another proprietary data warehousing vendor much less take on the substantial incremental costs associated with this complementary strategy. Option 2: Cloud Computing and Software-as-a-Service Options Two of the hottest current topics in IT are the role of Cloud computing and Software-as-a-Service (SaaS) in the enterprise. There has been a lot of press focus around these concepts due to the meteoric growth of one or two of the SaaS vendors and because household brand names have entered the Cloud computing arena. In reality, these options are more delivery-based options than technology options since the services rely on the vendor’s ability to host either software or appliance-based systems and to deliver cost-effective, managed services rather than the vendor’s ability to innovate and bring new technology to the market. page 4
  • 5. Cloud Computing Cloud computing-based options are relatively new and, although some enterprises are piloting these solutions, there are few, if any, examples of multi-terabyte data warehouses deployed using this model. However, it is certainly possible to see how this option might play a role in proof of concept projects; those that will have a limited production life or projects that need to be deployed extremely quickly. The chief concerns around Cloud computing are performance and security. A lot of thought and planning is required before moving large volumes of sensitive and mission critical data across the public Internet to be processed on a shared storage and computing infrastructure that is processing a mixed workload of online and analytics applications. Issues may include compliance; the speed and logistics of loading terabytes of data over the Internet; and predicting performance on a mixed workload platform when so many variables are in play. Surveying the market, there appear to be no Cloud-based “pure-plays”. Thus, even those people who are trumpeting the benefits of the Cloud-based architecture are hedging their bets and supporting this as one of several deployment models. Typically, these vendors point out that this model is well-suited to trial deployments but they will push for an on-premise model when it comes to large-scale and/or production deployments. SaaS Options Several Software-as-a-Service (SaaS) vendors have recently launched in the data warehousing/Business Intelligence market. Almost all of these companies are really Business Intelligence tool vendors going to market through a SaaS model. They offer very intuitive web-based user functionality but do little, if anything, at the database level to impact performance. In addition they share many of the same issues such as security and predictable performance that apply to the Cloud-based option. Option 3: Appliance Options Just as open source databases are here to stay, so are data warehousing or analytics appliances. As James Kobielus of Forrester Research wrote in his April 2008 report Appliance Power: Crunching Data Warehousing Workloads Faster and Cheaper than Ever,5 “Appliances are taking up permanent residence in the heart of the enterprise data center – the data warehouse (DW). DW appliances – in all their bewilder- ing proliferation – are moving into the mainstream.” Data warehousing and analytics appliances have been with us for many years and are proven architec- tures in global enterprises. They are particularly well-suited to sectors such as finance, retail, consumer packaged goods and travel where the need for high-performance and massive scalability is aligned with the ability to spend hundreds of thousands, if not millions, of dollars on appliance-based solutions. These appliances have proven the concept that a purpose-built device can compete with - and outperform - the database plus commodity server solutions that have dominated the data warehousing landscape for so long. page 5
  • 6. We can look at the evolution of data warehousing appliances in a number of different ways. However, looking back over the last 10-15 years, it is clear that here have been three waves of innovation, whether true technology innovation or marketing-led innovation. First Generation Appliances: Proprietary Appliances The pioneering vendors who developed the first wave of these appliances had to architect their solutions around proprietary hardware and software architectures in order to deliver the performance to establish themselves as a viable alternative to traditional database solutions. Moreover, because these appliance vendors were targeting the highest level of the enterprise market and were competing with database solutions that cost millions of dollars, they built pricing models very similar to those of the database vendors with entry-level price-points in the high hundreds of thousands of dollars. The first generation vendors proved that appliances could deliver superior analytics power at a lower price-point than the traditional database plus server solutions. However, their proprietary solutions were almost as expensive as traditional options and required highly specialized teams to design, develop, deploy and maintain them. Second Generation Appliances: Virtual or Bundled Appliances Seeing the success of the early innovators, a second wave of vendors came to market. Although some of these companies brought new technology innovations to market, the majority were more marketing plays based on virtual appliances or loosely coupled bundles of software and hardware components. Many of these second generation appliance vendors are hardware companies that have acquired data warehousing business units as a go-to-market mechanism for their core hardware solutions. Others are niche software players looking to benefit from the lower support costs inherent in the appliance model and the reduced total cost-of-ownership economics that they can offer to their customers. The good news is that this second generation brought significant marketing spend to the table and educated many enterprises on the benefits of an appliance-based approach to data warehousing. The competition from having multiple, similar vendors in the market also brought prices down to where entry- level price points were typically at the $100,000 level. The marketing spend and activity also piqued the interest of many VCs and entrepreneurs who saw the business opportunity for analytics appliances and believed they could bring new innovation to the market. This investment and start-up activity is now giving rise to a third generation of analytics appliances. Third Generation Appliances: Open Source Appliances As open source databases become more feature rich and better supported and as enterprises adopt such databases in increasing numbers, it is inevitable that appliances vendors will use standard or modified open source databases as one of the key building blocks of their solutions in preference to the expensive options from the traditional database vendors. However, while these various third-generation appliance vendors use the common building blocks of commodity hardware and open source software, the way in which these vendors have configured these systems is, to repeat Forrester’s word, “bewildering”. page 6
  • 7. Kickfire: The Best of All Worlds? Notable in the third generation category is Kickfire. Kickfire’s vision is to develop a “best of all worlds” solution; an open source data warehousing solution supporting the key features of an analytics database architecture deployed on a dedicated appliance that can deliver enterprise-level performance and function- ality at an affordable price-point. It is no secret that general purpose CPUs have major bottlenecks that need to be eliminated rather than mitigated. Typically vendors and users seek to ease these bottlenecks through proven techniques such as parallelism, disk striping and advanced tuning. However, these are stop gap measures that do not address the fundamental issue that general purpose computers are not optimized to move and analyze large amounts of data in a short period of time. Kickfire’s appliance-based solution addresses this fundamental issue in a new and innovative way: it is a purpose-built appliance optimized for data warehousing performance that deploys the latest analytic database features on a standard, open source database. In line with industry trends towards open source, Kickfire selected Sun’s MySQL database as its database engine and developed a Storage Engine that plugs directly into MySQL’s core architecture. So, what’s different about Kickfire? • World’s First SQL Chip - at the core of the Kickfire Database Appliance is the world's first SQL chip that uses parallel, pipelined data flow to deliver the power of tens of high-end, general purpose CPUs on a single processor. Backed by large amounts of directly addressable memory, this provides blazing raw performance to power the Kickfire software. This architecture, called Dataflow Architecture, has been the basis of many high-performance military, scientific and research systems going back to the mid-1980s and is well-proven. • Enterprise Class Data Warehousing Software Features - at the software level, Kickfire brings to MySQL the now proven concept of storing analytics data by columns not rows. This minimizes read access times and, combined with Kickfire’s highly-efficient data compression and hardware-based search indexes, guarantees predictable and scalable performance without the traditional need for constant tuning or adding more and faster hardware. • Open Source Standards – unlike other appliances, Kickfire uses the standard Linux Operating System and runs the standard MySQL database. This means that the ever growing range of open source business intelligence tools and utilities for data loading, backup and restore can be leveraged with Kickfire. Kickfire’s appliance delivers enterprise-class performance more efficiently than any another data ware- housing architecture. Customers therefore benefit from record-breaking performance delivered with the simplicity of an appliance in a cost-effective, low TCO package and the ease of installation and manage- ment of a standard database. page 7
  • 8. Figure 2 Kickfire: Open Source and Industry Standard Architecture SQL Chip P o w e re d by Standard Database Standard Server Standard Storage External Storage Because the Kickfire appliance is a true appliance and not a bundle of loosely-couple hardware compo- nents, it has a small form factor and needs minimal power and cooling in sharp contrast to the racks of commodity servers and disk arrays that are typical of other high-end analytics solutions. In fact, relative to the typical server configurations needed to power traditional terabyte plus data warehousing solutions, Kickfire’s appliance needs less than 10% of the rack space and consumes less than 650 watts – about the same as a typical microwave oven. Unlike many other solutions, Kickfire is targeted towards the data marts and medium-sized data ware- houses that most forward-thinking enterprises are implementing in preference to the monolithic, complex data warehouses architected to house every shred of information within the enterprise. Kickfire is targeting data sizes from tens of gigabytes to the low tens of terabytes in size and is packaged as a purpose-built appliance that is quick to deploy, requires minimal tuning and maintenance, takes up less space and power in the data center and, in many cases, pays for itself with the first project. Kickfire believes that the appliance-driven commoditization of enterprise applications is here to stay and should not be the exclusive preserve of enterprises willing and able to spend millions of dollars on data warehousing solutions. TPC-H Benchmarks: the Proof is in the Benchmarks In May 2008 Kickfire published newly audited results based on The Transaction Processing Performance Council’s TPC-H benchmarks that shocked many of the traditional data warehousing vendors. An unknown company, Kickfire, had broken the performance record6 in the non-clustered category and the price- performance record on the rigorous industry-standard TPC-H 300 GB benchmark, delivering a record- breaking 54,895 queries per hour on a 300 GB database. Not only did Kickfire set a new performance record, it did so at an unheard of cost – the Kickfire appliance that was tested cost less than $50,000, roughly a quarter the cost of traditional solutions from the database and hardware giants. page 8
  • 9. What does this mean for my business? Business in the 21st century is driven by the web and the architectural basis for the new web economy is the LAMP7 stack. A key component of LAMP is MySQL which is emerging as the primary repository of online information worldwide. As data volumes grow, the ability to rapidly analyze this information breaks down because MySQL is architected and optimized to support transactional systems. Kickfire is the first and only analytic appliance for MySQL enabling businesses that depend on MySQL analytics to: • Improve profit margins • Deploy services faster • Offer new self-service, high-performance information applications • Consolidate servers and data to reduce cost • Achieve 10-100X analytic and reporting performance improvements • Scale operations as data volumes grow Kickfire customers are active participants in the web economy. While there are many uses of Kickfire for data analysis, two representative use cases are: Marketing Analytics The movement of services, communications and commerce online presents marketers with compelling opportunities. In the new online world, click-stream data and session history reflect actual customer behavior at a level of detail never captured before. Both B2B and B2C marketers can replace sampling techniques based on focus groups, opinion surveys and shopping observers with real data that reflects the entire population of prospective buyers. Today, customer analytics distinguishes the leaders from the laggards. Leaders leverage customer analytics to optimize campaigns, dictate contact and advertising strategies, segment markets and improve the bottom line. Retailers, marketing service providers, e-commerce companies, mobility service providers, telecommunica- tions service providers, government organizations and others are working with Kickfire to better understand their data. Kickfire has transformed hour-long queries into queries that run in less than thirty seconds for one marketing service provider vs. their hand-coded, optimized queries. With Kickfire, this marketing service provider can allow more users access to their information and deliver new chargeable services to customers confident in the performance of the query system. Network and Security Management Data Analysis Today’s network and security management tools monitor devices and generate ever increasing amounts of as-polled network data. But the systems to analyze this data fall short in both performance and data scalability. Often businesses are forced to analyze less data or buy multiple systems to scale to the desired amount of data. By removing these barriers and enabling analysis of historical data for longer periods of time, managers can more effectively evaluate application availability, plan capacity, determine appropriate thresholds to deliver users timely and relevant alerts and manage service-level agreements. Network and security management tools providers and corporations across many industries are working with Kickfire to deliver greater value from their network data. One network management company has achieved 600X query performance improvements from Kickfire enabling them to offer trend analysis on a full year of data rather than the present maximum of 30 days data. And, higher performance means less hardware is required so this same customer will be able to consolidate 5-10 network analysis systems into one Kickfire appliance when fully deployed. page 9
  • 10. Summary In this white paper we have examined the issues and options facing companies that are evaluating their data warehousing and analytics needs. We have highlighted concerns and issues that many enterprises have towards the adoption of newer data warehousing technologies such as open source databases and appliances. The available evidence appears to support Gartner’s prediction that by 2011, at least 80% of commercial software will contain significant amounts of open source code.8 Combined wth the trend towards appli- ances as the lowest cost and most efficient vehicle for data warehousing, the case for an open source and, specifically, a MySQL-based data warehousing appliance such as Kickfire is compelling. For data warehousing users, who are interested in the cost-saving potential of open source databases but concerned over performance and scalability, Kickfire presents a “best of all worlds” option. Combined with MySQL’s low cost-of-ownership and deserved reputation for ease-of-use, Kickfire’s column-based process- ing, indexing and compression software and the raw power of the Kickfire SQL chip bring enterprise-class performance and data warehousing functionality to MySQL. For MySQL users looking at how to use MySQL for data warehousing and analytics, the case for consider- ing the Kickfire appliance is an obvious one: Kickfire brings processing power that general purpose Linux servers simply can’t match and data warehousing-specific software features not available in any other MySQL storage engine or application. As the analyst community rightly points out, data warehousing and analytics appliances are now part of the mainstream. Kickfire recognizes this and is leading the way towards making this technology simple to implement, easy to use and affordable for companies of all sizes. page 10
  • 11. 1 Gartner Inc. press release “Gartner EXP Survey of More Than 1,400 CIOs Shows CIOs Must Create Leverage to Remain Relevant to the Business” January 23, 2008 2 There are more than 11 million active implementations and over 50,000 downloads per day of MySQL according to Sun Microsystems/MySQL. 3 Market Update: Open Source Databases by Forrester Research, Noel Yuhanna. July 2008. 4 Column-Stores vs. Row-Stores: How Different Are They Really? Daniel J. Abadi, Samuel R. Madden, Nabil Hachem. In proceedings of SIGMOD 2008. 5 Appliance Power: Crunching Data Warehousing Workloads Faster and Cheaper Than Ever, James G. Kobielus, Forrester Research. April 2008. 6 As of August 31st, 2008, the Kickfire Database Appliance Series 2400 delivers 54,895 QphH@300GB (Queries per hour on the TPC-H benchmark) propelling Kickfire to world leadership in query performance (non-clustered systems) on the 300GB TPC-H benchmark. Kickfire is also number one in price/performance at $0.89/QphH@300GB USD on the 300GB benchmark. Moreover, Kickfire delivers this record breaking performance with a 3 year total system cost of only $48,790 USD. Kickfire’s price performance metric can be found at http://www.tpc.org/tpch/results/tpch_price_perf_results.asp. The Kickfire Database Appliance is in beta and will be available October 14, 2008. TPCH, QphH and $/QphH are trademarks of the TPC. For additional information on the TPCH benchmark, please visit the Transaction Processing Performance Council's Web site at http://www.tpc.org/. 7 The LAMP “stack” software bundle is the open source web platform consisting of Linux, Apache, MySQL and Perl/PHP/Python. 8 http://www.networkworld.com/news/2007/092007-open-source-unavoidable.html page 11