Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction To Maxtable


Published on

Introduction to Maxtable that´s a mass, high performance and scalable storage system for structured or unstructured data.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Introduction To Maxtable

  1. 1. Introduction to MaxtableXue Yingfei
  2. 2. Agenda  Architecture Overview  Key Features  Maxtable Query Language (MQL)  Operation and Maintenance  Future Works5 Mar 2012 2
  3. 3. Architecture Overview ( 1 )  Maxtable consists of three components: 1. Metadata server: This provides the global namespace for all the tables in this system. It keeps the B-tree structure in memory. 2. Ranger server: It holds some ranges of the data and the default size of one range is about 100GB. 3. Client library: The client library is linked with applications. This enables applications to read/write data stored in Maxtable.  What components in the system and how they relate to one another.5 Mar 2012 3
  4. 4. Architecture Overview ( 2 )  How to store the table in the disk ? One SSTable = 4M data. One Tablet = 25K SSTable = One range = 100G. One Table = 42K Tablet. So, one table can contain more than 4PB data, and we can extend the size of block or use two tablet levels to save index data to contain more data.5 Mar 2012 4
  5. 5. Architecture Overview ( 3 )  How does maxtable work • Maxtable stores data in a table, sorted by a primary key(the first column). • There are two types for data in the table: varchar (string) and int (number). • Scaling is achieved by automatically splitting tables into contiguous ranges and assigning them up to different physical machines. • There are two types of servers in a Maxtable cluster, Ranger Servers which hold some ranges of the data and Meta Servers which handle meta management works and oversee the Ranger Servers. • A single Range Server may hold many continuous ranges, the Meta Server is responsible for farming them out in an intelligent way. • If a single range fills up, the range is split in half(middle-split). The top half of the range remain in the current range and allocate a new range to save the lower half of the range, two ranges still locate at the current Ranger Server till the Ranger Server become overload, the Rebalancer will trigger Meta Server to reassign some ranges of the data locating at the overload Ranger Servers to other Range Servers that have enough space.5 Mar 2012 5
  6. 6. Key Features ( 1 )  Scalability: • New ranger nodes can be added as storage service needs increase, the system automatically adapts to the new nodes while running the rebalance.  Data writes: • When an application insert a data, writes can be cached at the Ranger server, periodically, the cache is flushed, for consistency, applications will force one data log to be flushed to the disk.  SSTable Map: • This feature will reduce the data consistency control and improve the performance of data write, and we use a innovative method that it doesnt need any lock mutation for multi-writes to solve the conflicts between writes.  Cache All Data: • In MaxTable we can cache all the metadata in the Metaserver and the hot data in ranger server.  Re-balancing: • Using the tool to rebalance the tablets amongst Rangerservers. This is done to help with balancing the workload amongst nodes.5 Mar 2012 6
  7. 7. Key Features ( 2 )  Index: • Maxtable will automatically build one unique index for each table by the first column.  Recovery: • Maxtable implements the write ahead logging (WAL) to make sure this writing is safe. It can recover the crash server by replaying its log.  Failover: • Metaserver maintains a heartbeat with each rangerserver, while the metaserver detects that the range server is unreachable, it will fail-over the data service locating on the crash rangerserver to another rangerserver and continue the service for this range.  Metadata Consistency Checking (MCC): • Data checking tools to ensure the data consistency between on the metaserver and rangerserver.  Backend Storage : • Maxtable’s backend storage can use distributed file system, currently it can use the KFS as its backend.5 Mar 2012 7
  8. 8. Key Features ( 3 )  Range Query • It will support the range query by the index cloumn or the non-index column. • Support the AND and OR in the WHERE clause. • Split the work over all the range nodes in a cluster.  Sharding • Automatic sharding support, distributing tablets over range servers. • Manually sharding support, it will scan all the tablet and split those tablets that have at least two blocks containing data. If customers want better scaling, they can do so manually by sharding tablets. • Generally, manually sharding will be followed by one rebalance operation that will rebalance the tablets because sharding may raise some new tablets.5 Mar 2012 8
  9. 9. Maxtable Query Language ( 1 )  CREATE TABLE • Create one table. – create table table_name (column1 type1, ...,cloumnx type x) – create table blogdata (key varchar, num int, createtime varchar, comment varchar)  INSERT • Insert one data row. – insert into table_name (column1_value,...columnx_value) – insert into blogdata (adidas, 1000, 2011-10-11, good)  SELECT • Select one data by the default key column – select table_name (column1_value) – select blogdata (adidas)  SELECTRANGE • Select data range by the range user specified – selectrange table_name (column1_value1, column1_value2) – selectrange blogdata (adidas, lining)5 Mar 2012 9
  10. 10. Maxtable Query Language ( 2 )  SELECTWHERE • Select data by the WHERE clause – selectwhere table_name where columnX_name(columnX_value1, columnX_value2) and columnY_name(columnY_value1, columnY_value2)  SELECTCOUNT • Get the # of rows by the WHERE clause – selectcount table_name where columnX_name(columnX_value1, columnX_value2) and columnY_name(columnY_value1, columnY_value2)  SELECTSUM • Get the total values of some one column by the WHERE clause – selectsum (column_name) table_name where columnX_name(columnX_value1, columnX_value2) and columnY_name(columnY_value1, columnY_value2)  DELETE • Delete one data – delete table_name (column1_value)  DROP TABLE • Drop one table – drop table_name5 Mar 2012 10
  11. 11. Maxtable Query Language ( 3 ) Following are the commands for the administrators.  SHARDING • Sharding one table – sharding table_name  MCC CHECKRANGER • Check the state of the rangers – mcc checkranger  MCC CHECKTABLE • Checking the data of the table – mcc checktable table_name  REBALANCE • Rebalancing the data load over the rangers – rebalance table_name5 Mar 2012 11
  12. 12. Operation and Maintenance  Platform requirement •  How to build • •  How to deploy •  How to use the client API • Mar 2012 12
  13. 13. Future Works  Implement the master-slave in metaserver.  Support secondary index  Support the Join operation.  Compaction & Compression5 Mar 2012 13
  14. 14. Contact Information  Thanks5 Mar 2012 14