CLUSTRIX OVERVIEWThe Leading Scale-out SQL Database.Engineered for the CloudPresenters
WHAT IS CLUSTRIXThe Leading Scale-out SQL Database. Engineered for the Cloud  PRIVATE CLOUD                               ...
TARGET APPLICATIONS                                     MILLIONS OF                                   USERS/DEVICES       ...
IT’S TIME TO REINVENT THE SQL DATABASE      WEB-SCALE                                        CLOUD    APPLICATIONS        ...
SCALING A DATABASE IS HARD                                        Scale - Up         Sharding          NoSQL              ...
CLUSTRIX: BUILT FOR SCALE AND THE CLOUD HIGH-SCALE                                                 REAL-TIME TRANSACTIONS ...
PERFORMANCE AND SCALE          High Scale Transactions                           Real Real-Time Analytics        • 20 mill...
CLUSTRIX DESIGNIntelligent Data                           Massively ParallelDistribution                               Que...
INTELLIGENT DATA DISTRIBUTION                    Tables                 • Tables split into slicesBillions of rows        ...
Upcoming SlideShare
Loading in …5

Clustrix Database Overview


Published on

Clustrix is the leading scale-out SQL database engineered for the cloud. With Clustrix, you can scale transaction throughput, run real-time analytics and simplify operations.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Introduction to the basic product options and value proposition.Key points to emphasizeClustrix has reinvented the database from the ground up to deliver a revolutionary scale-out SQL database platformWe can deliver the solution any way you want to consume it, as an on-premise appliance or as-a-ServiceClustrix was founded by Paul Mikesell (Isilon founder) and Sergei Tsarev (AOL), and is led by veteran executives from Mercury, Veritas, HP, and IsilonKey takeawaysWe are very real and will be around for the long haulThe technology is very real and provenYour bet on Clustrix is a safe one that will pay off for you in a big wayClustrix is proven in production at more than 30 customers around the world, like CSC, Symantec, AOL, and Rakuten
  • Clustrix is built for applications with large data sets, growing customers/transactions and billions of rows. It is proven globally in production with more than 30 customers worldwide. Use this to drive home that our solution is proven in production in the real world, and that big names have bet on us (CSC, AOL, Rakuten, Symantec, etc.)Key points to emphasizeClustrix is proven in production at more than 30 customers around the world, like CSC, Symantec, AOL, and RakutenClustrix has reinvented the database from the ground up to deliver a scale-out SQL database platformKey takeawaysOther new database companies are in Beta or are being used in side-projectsClustrix is proven in production, and every customer is a reference.
  • This slide sets up the two major problems Clustrix can solve for the customer: performance and scale for Web-scale applications. Legacy database is failing on two frontsWeb-scale applications overload itNot mapping to cloud architectureKey points to emphasizeLegacy relational databases were great for their time, but the architecture is 40+ years oldThey were designed to do amazing work for individual “small data” appsUnfortunately, this leads to two giant problems:They become the “choke point” for new web-scaleapplications (images on left) as they cannot handle the data set size, concurrency, or throughput demandedThey were not built with Big Data and Cloud in mindKey takeawaysYour legacy database will be the biggest obstacle you face for all these wonderful new web-scale apps you want to buildAnd your enterprise is almost certainly spending a fortune for those hundreds or thousands of legacy RDBMS instances.
  • This slide illustrates the very poor legacy or alternative options for solving the “scale problem” of legacy databasesKey points to emphasizeScale-up has been the primary way to gain scale from a legacy DB. You want it to go faster, buy a bigger box. It’s a very expensive stopgap solution, and in some cases, you can’t buy a box big enough.Another option has been to manually “shard” the database. This is a terrible option. Not only is it expensive, much of the power of the DBMS is lost, and you are saddled with permanent, debilitating complexity. And you probably will have to shard again!!Finally, you can try NoSQL. Like sharding, most of the power of the database is lost, and the developer ends up (re)writing the database functionality in the app. It’s a very expensive, custom approach to re-inventing the database wheel.Key takeawaysThe traditional approaches to scaling the database are not workableSharding and NoSQL require developer and operations time, making TCO and Time to market bad.And while NoSQL might work well for some specific workloads like content management, the last thing you want your developers doing is rewriting all your applications and reinventing the database over and over inside each one of them
  • This slide conveys what we believe are the key characteristics of the ideal database for real world workloadsand the Cloud. In other words, this is the “wish list” for the ideal database.Key points to emphasizeScale-Out SQL is the way to goClustrix offers a scale-out SQL database that lets you simply add more nodes* to your cluster as demand grows so you can serve more users, transactions and data. High-Scale TransactionsClustrix delivers high transactional query throughput with near linear scale at virtually any data set size and concurrency for all real-world query workloads.Real-time AnalyticsYou can run analytic queries against your main database (while running transactions) to get real-time insights and operational intelligence. Clustrix uses Massively Parallel Processing (MPP) that uses multiple cores across nodes in parallel to speed up your analytic queries.SQL, MySQL and ACIDWith Clustrix, you get ACID guarantees and the full power of a SQL interface. Our database is on the wire compatible with MySQL, which means that you can use your existing application code and connectors with Clustrix.Self-healingClustrix is easy to install and automates fault tolerance. Clustrix is built to be self-managing and simplifies operations allowing DBAs to focus on high value add tasks, greating reducing the ownership cost.Customer Proven Clustrix has been serving production workloads since 2008. We power dozens of large-scale production customers all around the world. Our largest customers have datasets with billions of rows, multiple terabytes of data, and non-trivial transactional workloads approaching 100,000 TPS in production.Superior ServiceClustrix provides services that out customers love. Our DBA-on-demand service provides deep technical insight. Managed services in DBaaS monitors your database to find issues before you do.
  • Building a scalable distributed database requires two thingsDistributing the data intelligentlyMoving the queries to the data
  • Clustrix support MySQL replication both as master and slave – so you can replicate both ways.Within a cluster we saw earlier that all data has multiple copiesFor Disaster Recovery (when a whole region loses power) Clustrix has 2 optionsFast Parallel Backup – This is in addition to slower MySqlDump backupFast Parallel Replication – This is asynchronous across two Clustrix Clusters
  • This slide covers the 4 major use cases for the Clustrix solution. Of course, as a general purpose database, Clustrix can be used for any app or project, and we do want customers to build all their new apps on us.Key points to emphasizeHigh-Scale Transactions: Scale without sharding – simply put, Clustrix scales like no other DB, whether it’s data set size, concurrency, transactions, or all three at once. The architecture plus the Flash storage is unbeatable.Operational Intelligence: Real-time analytics – Clustrix is so powerful, most customers do real-time analytics on their operational data. No need to set up other systems and ETL the data all over the place.DB consolidation or private DBaaS – this is the future of the database in the enterprise. A private Clustrix DBaaS gives every developer and app a killer database and gives operations a radically simple platform to operate.Business Critical MySQL: MySQL on steroids – MySQL is a great database to get you started, but like all legacy DBs, it can quickly present scale, performance, and availability problems. Clustrix immediately and seamlessly takes away the pain and complexity of MySQL DB problems. Key takeawaysEvery customer will have one or more of these opportunitiesUse this slide to ask which of these are most relevant to their business
  • Clustrix enables you to run analytic queries against your primary database to get real-time insights, while your database is performing transactions. Clustrix removes the costs associated with development, deployment and maintenance of a separate analytic database. Our multi-version concurrency control ensures that analytic queries never interfere with writes. For fast processing of queries processed, we use Massively Parallel Processing that uses multiple nodes and multiple cores within each node to process one query. Clustrix distributes all query types including complex joins and aggregations.Key points to emphasizeA great example of real-time analytics and OLTP powered by the same databaseMedExpert has a contract with Medicare that will require them to scale their system by several orders of magnitude.They started on Oracle, then tried MS-SQL. Neither was the long-term solution.They’ve migrated to Clustrix for their long-term solution and expect to grow to 20 nodesThe migration effort was relatively small, as T-SQL is similar to MySQL, and it was more than worth it to implement the right long-term database solutionKey takeawaysWhen customers look close at the workload they’re throwing at their database, we seen an increasing mix of transactions and analytics. Your customer probably sees the sameHaving a system that can handle both with ease keeps life much simpler, saves money, and lets the teams focus on the business rather than complex multi-database infrastructureWhat is MedExpert?MedExpert International provides a healthcare solution called IMDS — individualmedical decision systems – that is used by companies, labor unions, schools, andgovernment to improve health outcomes and reduce costs. The IMDS application wasinitially coded with Oracle Enterprise, but because of rising costs, moved to MS SQL.MedExpert realized that not only were both databases expensive to manage, neitheroffered the scalability needed for the next phase in their project. MedExpert had a leanteam that wanted to keep database administration to a minimum as well as remain agileand responsive to business needs.With new, large contracts, MedExpert needed to scale from 1,000 concurrentconnections to 30,000 within a year, and much more the following year. Concernedthat real-time analytic queries were taking too long already, MedExpert considered ashort-term solution with larger hardware and flash storage and a long-term solution withsharding. Both were at odds with the goal to maintain a lean and agile operations team.MedExpert evaluated Clustrix and achieved query performance with gains of 50%-200%. Clustrix also offered predictable performance even when additional capacitywas added to the cluster, enabling MedExpert to scale with their growth plan. WhenClustrix configuration was pushed to the limits, instead of crashing, it gradually declinedin performance, maintaining the desired availability of the application. Compatibility withRuby on Rails was another big plus.MedExpert standardized on Clustrix exclusively while achieving the desired savings,scalability, performance, and administration goals. The company was able to achieveTCO savings greater than 50% of what the old systems would have cost to scale.
  • Clustrix excels at high scale transactions at virtually any data set size and concurrency. Clustrix makes scaling easy so as your business grows, your database grows with it.When your data size and transaction rates exceed the capacity of your current cluster, you can simply expand your capacity by adding nodes. As new nodes are added, we automatically move data to it for immediate utilization. Each additional node gives the cluster more storage, memory and processing power. Our intelligent data distribution ensures that data is evenly distributed across nodes so we maximize node utilization for processing transactions. Massive Media owns TWOO, the #1 dating website in the world. TWOO powers their business on Clustrix and is one of our biggest and most successful customers, showing the ability of Clustrix to support radical scale. Key points to emphasizeA great example of radical scale without shardingMassive Media built a hugely popular site, NetLog, years earlier, but had to shard their DB, and it was a nightmareThey swore they’d never do it again.When starting TWOO, were evaluating ClustrixTWOO took off like a rocket, and w/in 3 months, MySQL couldn’t handle the workloadThey moved Clustrix from slave to master and never looked backSince then, they’ve expanded to 15 nodes and drive nearly 70K TPS (that’s ~175 billion per month)They’ve recently purchased a DR cluster, taking them to 30 nodesKey takeawaysThis illustrates radical scale on every dimension (data set size, table size, concurrency, transactions)They’ve never had to shard or worry about the database. All they do is focus on is business- and customer-driven innovation.
  • Clustrix is built with fault tolerance for high availability. You get the full power of SQL with ACID guarantees. With Clustrix, you can be assured your business-critical data is safe and secure.  Rafter (formerly BookRenter) makes cloud based software that helps schools dramatically lower the cost of education, starting with course materials. Rafter works with over 500 colleges and universities across the nation. Key points to emphasizeA great example of Business Critical MySQL use caseRafter’s business got started on MySQL, and as they quickly gained popularity, they ran into two problems.High concurrency (not data set size) created a scale problem and kept causing the database to crash. They upgraded their hardware, but that just bought them time.In response, they created a replication scheme to protect against downtime, but it was complex and unstable, as well.Clustrix gave them a seamless transition and solved their concurrency scale and availability problems at the same time. 80% less downtime and 50% cost savings.They got the added benefit of unlimited scale when they need it; simply add a node.Key takeawaysMySQL can run into a variety of problems, not just massive scale (concurrency, performance, availability)Clustrix is the perfect solution: a drop-in replacement that’s inherently fault tolerant and scales in all dimensions without limits.Within a cluster, every slice of your data has a replica on another node. You can choose how many replicas you want with fine-grained control, down to table and index level. This means that your database remains available and does not lose data in the face of a single disk or node failure. On failure, some pieces of data have fewer copies left and for those additional copies are rapidly regenerated. Clustrix also supports parallel replication between clusters. This can be used to setup asynchronous replication across geographical regions. Clustrix parallel backup allows you to rapidly create backup that is multiples faster than MySQL backup. Master and slave MySQL replication can also be set up using Clustrix.  With these features you can maintain availability in the face of hardware failures and events affecting geographical regions like electricity outages and natural disasters. Many of our clients use failover clusters with replication and parallel backup as best practices.
  • Reduce complexity, management and maintenance costs by moving multiple databases onto a Clustrix database. Clustrix offers a simple, cost-effective solution for database sprawl issues faced by many enterprises with multiple MySQL databases serving internal or customer facing applications. Clustrix also solves for long-term scaling challenges. With Clustrix platform, you benefit from an self-managing architecture that enables scale-out by simply adding nodes as some of your applications grow and need to scale. Founded in 1997, Rakuten, Inc. and Rakuten Group has become the top Internet service in Japan – generating 1% of the nation’s GDP – and one of the top full-line Internet services companies globally. The Rakuten ecosystem includes e-commerce online travel, online books and media, and online banking.Key points to emphasizeRakuten is defining the future of databases for the enterprise – the private DBaaS or database cloud.Their vision is to replace over 1150 single instance databases with a single managed 35 node Clustrix private Database-as-a-Service.They are setting up a service around Oracle, as wellEvery developer will have access to Clustrix DBaaS, and every app gains the benefit of scale, performance, and fault toleranceThey project a savings of 90% from their current database spend, saving them ~$4M per yearKey takeawaysThis is what’s next in enterprise database capabilitySets Rakuten up for the next 10 years of Big Data appsThere are only two database platforms that can possibly do this – Clustrix and Oracle Exadata, and Clustrix is 1/10th the cost of OracleIncredible cost savings, capabilities, and simplicity
  • Building a scalable distributed database requires two thingsDistributing the data intelligentlyMoving the queries to the data
  • Building a scalable distributed database requires two thingsDistributing the data intelligentlyMoving the queries to the data
  • Building a scalable distributed database requires two thingsDistributing the data intelligentlyMoving the queries to the data
  • ER – Entity relational databasesObject is object oriented databases
  • Clustrix Database Overview

    1. 1. CLUSTRIX OVERVIEWThe Leading Scale-out SQL Database.Engineered for the CloudPresenters
    2. 2. WHAT IS CLUSTRIXThe Leading Scale-out SQL Database. Engineered for the Cloud PRIVATE CLOUD PUBLIC CLOUD Flash Appliance DBaaS DBaaS • Vertically integrated solution • Fully managed • Maximum flexibility • In-house: private data center • Monthly subscription • The only scalable primary SQL /colocation facility database in AWS • Uses flash appliance
    5. 5. SCALING A DATABASE IS HARD Scale - Up Sharding NoSQL Application Relational Logic NoSQL Expensive Engineering and Engineering and band-aid ops overhead ops overheadCUSTOMER PRIORITIES Cost Time to Market Operational Simplicity Scale and Performance
    6. 6. CLUSTRIX: BUILT FOR SCALE AND THE CLOUD HIGH-SCALE REAL-TIME TRANSACTIONS ANALYTICS • Linear scalability for writes/updates/reads • Linear speedup for analytics • Double nodes  double transactions/sec • Double nodes  half the query time REAL WORKLOADS SCALE-OUT SELF-MANAGING BUILT-IN FAULT TOLERANCE Add nodes as demand grows ACID, SQL AND MYSQL
    7. 7. PERFORMANCE AND SCALE High Scale Transactions Real Real-Time Analytics • 20 million+ users / 70,000+ TPS • Write heavy workload; 1TB+ writes / day• Massive Media • Near-linear speedup for analytics• Near-linear scalability for reads/writes/updates • More nodes  faster queries• Add more nodes to handle more TPS
    8. 8. CLUSTRIX DESIGNIntelligent Data Massively ParallelDistribution Query Processing SQL SQL SQL SQL SQL Shared Node Node Node Nothing Architecture Query Data Query Data Query Data Compiler map Compiler map Compiler map Database Engine Database Engine Database Engine
    9. 9. INTELLIGENT DATA DISTRIBUTION Tables • Tables split into slicesBillions of rows • Each slice has replica on another node • Adding a node triggers re-balance • Losing a node triggers re-protect S1 S1 S2 S2 S3 S4 S4 S5 S3 S5 Node Node Node Node S2 S5
    10. 10. PARALLEL QUERY PROCESSING Simple queries Complex queries • Fielded by any node • Split into query fragments • Routed to data node • Process fragments in parallel Node Node Node Node
    11. 11. REPLICATION AND DISASTER RECOVERY MySQL to Clustrix Replication Clustrix to MySQL Replication DISASTER RECOVERY Fast backup Asynchronous replicationMySqlDump Clustrix Parallel Backup Backup
    12. 12. CLUSTRIX TOOLS: INSIGHTMonitor database health Real-time and historical insight into query performance
    13. 13. DATABASE LANDSCAPE Transactions Real-Time Data (OLTP) Analytics (OLAP) Warehousing Size: 10s of Terabytes Size: 10s of Terabytes Size: Petabytes Mode: Online Mode: Online Mode: Offline Best fit: Row stores Best fit: Either Best fit: Column stores SHARED NOTHING HP Vertica, EMC Greenplum, Amazon Redshift COLUMN STORES100TBs SHARED NOTHING ROW STORE Clustrix IN-MEMORY COLUMN STORES SAP Hana SHARED DATA ROW STORES Oracle RAC, NuoDB Massively Parallel Processing1TB SINGLE NODE ROW STORES MySql, MS Sql Server, IBM DB2, Oracle Single node query processing IN-MEMORY MemSQL, VoltDB, MySql Cluster ROW STORES Concurrent Writes/Updates Query Complexity
    14. 14. USE CASESHigh-Scale Operational MySQL Business CriticalTransactions Intelligence Consolidation MySQL 10x SCALE 200% performance 1/10th TCO benefit 90% lower without DB experts gain with 50% less by eliminating downtime with 50% or app changes TCO database sprawl less TCO
    15. 15. QUESTIONS AND NEXT STEPSQuestions?
    16. 16. OPERATIONAL INTELLIGENCE Analytics Application: Professionals provide expert advice to improve patient outcomes THE CHALLENGEMicrosoft SQL Server New DoD & Medicare contracts One Scale-Out databaseMedExpert proprietary treatment Expected 100x increase in usage • 4 nodes - growth to 20research • Minor application changes & tuning Alternatives Considered Why Clustrix Clustrix Results Fusion I/O – 20% boost POC showed performance 50% - 200% faster query response No TTM to shard the application boost for analytics queries TCO less than 50% near term and linear scale for long term
    17. 17. HIGH-SCALE TRANSACTIONS • Write heavy workload with 1TB+ writes per day • 20 million+ users / 70,000+ TPS CLUSTRIX 18 NODES • 11X+ the TPS of a single MySQL server • 20B+ Rows of data “Pre-Clustrix, we spent a lot of time on optimizing for performance and scale. Now we can spend those resources better.” Toon Coppens CTO and Co-Founder Massive Media
    18. 18. BUSINESS CRITICAL MYSQL SaaS Application: Low cost course materials for education THE CHALLENGEChaotic/Unstable MySQL Back-to-School Expansion 3 node clustersEnvironment Uptime during critical peak season 2 geographic locations Automated Fault Tolerance & Easy Expansion Alternatives Considered Why Clustrix? Clustrix Results HW upgrade = stop gap POC showed easy to upgrade 80% reduction in downtime Replication implementation and expand a live Ruby on Rails TCO reduction in 50% was unstable & custom application
    19. 19. MYSQL CONSOLIDATION E-Commerce: ¥1.2 trillion per year CHALLENGE MySQL Sprawl Availability #1 priority Private DBaaS • 1150 databases • 10:1 Compression • 100 DBAs • Re-deploy staff Alternatives Considered Clustrix Results Fusion I/O couldn’t keep up 90% lower TCO MySQL tools – too unstable 14 nodes today – growth to 35
    20. 20. CLUSTRIX TECHNOLOGY Intelligent Data Distribution Parallel Query Evaluation Tables Normal queries Complex queries • Fielded by any node • Split into query fragmentsBillions of rows • Routed to data node • Process fragments in parallel SQL JSON • Tables split into slices SQL SQL SQL SQL 1 SQL • Auto-distribute, auto-protect, re-protect 3 S1 S2 S1 S2 2 • Application sees a linearly scalable, single instance MySQL database • Automatic fault tolerance • Online expansion, data (re) distribution, and schema changes
    21. 21. SCALE AND FAULT TOLERANCE • All data has multiple copies on different nodes • Re-balance on adding a node • Re-protect on losing a node AA BB C D D E C E Node Node Node Node B E
    22. 22. PARALLEL QUERY PROCESSING Simple queries Complex queries • Fielded by any node • Split into query fragments • Routed to data node • Process fragments in parallel Node Node Node
    23. 23. ANALYTIC QUERY PROCESSING Analytic queries get speedup from Start Node Massively Parallel Processing Query • Concurrent Parallelism • Pipeline Parallelism Read A, apply filter Node Node Node SELECT a, b Send each row to correct Node FROM A JOIN B on (id) based on id WHERE (A.a = 15) Read B and Join Node Node Node Return Node to User
    24. 24. SQL FOR STRUCTURED DATA With increasing data size, Unstructured struggling old SQL implementations MongoDB Data are replaced by new Distributed SQL CouchDB Hadoop NoSQL Distributed SQL Clustrix Relational Primary Structured Data Oracle Vertica System R Distributed SQL Postgres Greenplum Ingres Warehousing1970 1980 1990 2000 2010 SQL wins SQL wins Single Node Distributed SQL wins Hierarchical loses ER loses SQL Struggles NoSQL wins Network loses Object loses
    25. 25. CLUSTRIX APPLIANCE Clustrix Appliance 3 Node Cluster (CLX 4110 ) • 24 Intel Xeon CPU cores • 144GB RAM • 6GB NVRAM • 1.35TB Intel SSD protected • (2.7TB raw) data capacity • Low-latency Infiniband interconnect