Introduction To Maxtable

Introduction to Maxtable

Xue Yingfei
http://code.google.com/p/maxtable/

Agenda

 Architecture Overview
 Key Features
 Maxtable Query Language (MQL)
 Operation and Maintenance
 Future Works

5 Mar 2012 2

Architecture Overview ( 1 )

 Maxtable consists of three components:
1. Metadata server: This provides the global namespace for all the tables in this
system. It keeps the B-tree structure in memory.
2. Ranger server: It holds some ranges of the data and the default size of one range
is about 100GB.
3. Client library: The client library is linked with applications. This enables
applications to read/write data stored in Maxtable.
 What components in the system and how they relate to one another.

5 Mar 2012 3


 How to store the table in the disk ?

One SSTable = 4M data.
One Tablet = 25K SSTable = One range = 100G.
One Table = 42K Tablet.
So, one table can contain more than 4PB data, and we can extend the size of block
or use two tablet levels to save index data to contain more data.

5 Mar 2012 4


 How does maxtable work
• Maxtable stores data in a table, sorted by a primary key(the first column).
• There are two types for data in the table: varchar (string) and int (number).
• Scaling is achieved by automatically splitting tables into contiguous ranges and
assigning them up to different physical machines.
• There are two types of servers in a Maxtable cluster, Ranger Servers which hold
some ranges of the data and Meta Servers which handle meta management
works and oversee the Ranger Servers.
• A single Range Server may hold many continuous ranges, the Meta Server is
responsible for farming them out in an intelligent way.
• If a single range fills up, the range is split in half(middle-split). The top half of the
range remain in the current range and allocate a new range to save the lower half
of the range, two ranges still locate at the current Ranger Server till the Ranger
Server become overload, the Rebalancer will trigger Meta Server to reassign
some ranges of the data locating at the overload Ranger Servers to other Range
Servers that have enough space.

5 Mar 2012 5

Key Features ( 1 )

 Scalability:
• New ranger nodes can be added as storage service needs increase, the system
automatically adapts to the new nodes while running the rebalance.
 Data writes:
• When an application insert a data, writes can be cached at the Ranger server,
periodically, the cache is flushed, for consistency, applications will force one data
log to be flushed to the disk.
 SSTable Map:
• This feature will reduce the data consistency control and improve the performance
of data write, and we use a innovative method that it doesn't need any lock mutation
for multi-writes to solve the conflicts between writes.
 Cache All Data:
• In MaxTable we can cache all the metadata in the Metaserver and the hot data in ranger
server.
 Re-balancing:
• Using the tool to rebalance the tablets amongst Rangerservers. This is done to help
with balancing the workload amongst nodes.
5 Mar 2012 6

Key Features ( 2 )

 Index:
• Maxtable will automatically build one unique index for each table by the first column.
 Recovery:
• Maxtable implements the write ahead logging (WAL) to make sure this writing is
safe. It can recover the crash server by replaying its log.
 Failover:
• Metaserver maintains a heartbeat with each rangerserver, while the metaserver
detects that the range server is unreachable, it will fail-over the data service locating
on the crash rangerserver to another rangerserver and continue the service for this
range.
 Metadata Consistency Checking (MCC):
• Data checking tools to ensure the data consistency between on the metaserver and
rangerserver.
 Backend Storage :
• Maxtable’s backend storage can use distributed file system, currently it can use the
KFS as its backend.

5 Mar 2012 7

Key Features ( 3 )

 Range Query
• It will support the range query by the index cloumn or the non-index column.
• Support the AND and OR in the WHERE clause.
• Split the work over all the range nodes in a cluster.
 Sharding
• Automatic sharding support, distributing tablets over range servers.
• Manually sharding support, it will scan all the tablet and split those tablets that have
at least two blocks containing data. If customers want better scaling, they can do so
manually by sharding tablets.
• Generally, manually sharding will be followed by one rebalance operation that will
rebalance the tablets because sharding may raise some new tablets.

5 Mar 2012 8

Maxtable Query Language ( 1 )

 CREATE TABLE
• Create one table.
– create table table_name (column1 type1, ...,cloumnx type x)
– create table blogdata (key varchar, num int, createtime varchar, comment varchar)
 INSERT
• Insert one data row.
– insert into table_name (column1_value,...columnx_value)
– insert into blogdata (adidas, 1000, 2011-10-11, good)
 SELECT
• Select one data by the default key column
– select table_name (column1_value)
– select blogdata (adidas)
 SELECTRANGE
• Select data range by the range user specified
– selectrange table_name (column1_value1, column1_value2)
– selectrange blogdata (adidas, lining)

5 Mar 2012 9


 SELECTWHERE
• Select data by the WHERE clause
– selectwhere table_name where columnX_name(columnX_value1, columnX_value2) and
columnY_name(columnY_value1, columnY_value2)
 SELECTCOUNT
• Get the # of rows by the WHERE clause
– selectcount table_name where columnX_name(columnX_value1, columnX_value2) and
columnY_name(columnY_value1, columnY_value2)
 SELECTSUM
• Get the total values of some one column by the WHERE clause
– selectsum (column_name) table_name where columnX_name(columnX_value1, columnX_value2)
and columnY_name(columnY_value1, columnY_value2)
 DELETE
• Delete one data
– delete table_name (column1_value)
 DROP TABLE
• Drop one table
– drop table_name
5 Mar 2012 10


Following are the commands for the administrators.
 SHARDING
• Sharding one table
– sharding table_name
 MCC CHECKRANGER
• Check the state of the rangers
– mcc checkranger
 MCC CHECKTABLE
• Checking the data of the table
– mcc checktable table_name
 REBALANCE
• Rebalancing the data load over the rangers
– rebalance table_name

5 Mar 2012 11

Operation and Maintenance

 Platform requirement
• http://code.google.com/p/maxtable/wiki/Platform
 How to build
• http://code.google.com/p/maxtable/wiki/03HowToInstall
• http://code.google.com/p/maxtable/wiki/05HowToBuildWithKFSFacer
 How to deploy
• http://code.google.com/p/maxtable/wiki/04HowToDeploy
 How to use the client API
• http://code.google.com/p/maxtable/wiki/08ClientSampleCode

5 Mar 2012 12

Future Works

 Implement the master-slave in metaserver.
 Support secondary index
 Support the Join operation.
 Compaction & Compression

5 Mar 2012 13

Contact Information

 yingfei.xue@gmail.com

Thanks

5 Mar 2012 14

Introduction To Maxtable

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Introduction To Maxtable

Similar to Introduction To Maxtable (20)

Recently uploaded

Recently uploaded (20)

Introduction To Maxtable