Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

M|18 Securing Databases at Tencent Cloud

63 views

Published on

M|18 Securing Databases at Tencent Cloud

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

M|18 Securing Databases at Tencent Cloud

  1. 1. POLARDB for MyRocks Extending shared storage to MyRocks Zhang, Yuan Alibaba Cloud Feb, 2018
  2. 2. MORE THAN JUST CLOUD Agenda • Background • Basic Architecture • Implement details • Convert system tables to RocksDB • RocksDB WAL/Manifest Replication • DDL Replication • Cache Replication • Index Statistic Replication • New Log Format • MVCC
  3. 3. MORE THAN JUST CLOUD Background Why POLARDB for MyRocks Benifits from MyRocks • Greate space efficiency, better compression • Greate write efficiency, lower write amplification • Fast data loading Benifits from share-storage • Promising data consistency • Ability to scale read node immediately without full copy of data
  4. 4. MORE THAN JUST CLOUD Basic Architecture Primary • Accept Read/Write workload Replica • Only Accept Read workload • Share sst/wal with primary Replace binlog replication with WAL replication
  5. 5. MORE THAN JUST CLOUD Let’s Begin prepare for rocksdb wal replication • Base on AIiSQL5.7 • Port MyRocks from Facebook • Remove innodb, only support RocksDB and MyISAM engine • Convert system tables to RocksDB
  6. 6. MORE THAN JUST CLOUD Covert system tables to RocksDB Prepare for RocksDB WAL replication • Covert system tables to RocksDB • Except mysql.slow_log, mysql.general_log, they store in local disk, primary and replica have their owen mysql.slow_log, mysql.general_log tables.
  7. 7. MORE THAN JUST CLOUD Rocksdb WAL/Manifest replication Architecture
  8. 8. MORE THAN JUST CLOUD Rocksdb WAL/Manifest replication Asynchronous replication WAL Replication • Replay PUT/DELETE/MERGE Manifest Replicaion • Replay flush & compaction WAL and Manifest Coordination • Only apply VEdit while Applied lsn > VEdit lsn
  9. 9. MORE THAN JUST CLOUD Rocksdb WAL/Manifest replication Control Primary WAL and SST files deletion WAL deletion - original wal deletion will lead Replica lost wal • Lm: min_log_number on Primary • Ln: min_log_number on all Replicas • new_min_log_number= min(Lm,Ln) • When WAL’s number < new_min_log_number, then this WAL can be deleted SST deletion- original SST deleteion will lead Replica cannot find SST and crash • min_version_number: the min version number replica is using • SST can be deleted only when It will’t be used by Primary and all Replicas
  10. 10. MORE THAN JUST CLOUD DDL&Cache replication Architecture
  11. 11. MORE THAN JUST CLOUD DDL replication Remove frm,par files Remove frm,par files • Store these contents in RocksDB • Replica can read multi version of table schema • DDL replication is asynchronous
  12. 12. MORE THAN JUST CLOUD DDL replication Primary • Log MDL lock start and end. Replica • Replay MDL lock start A. lock MDL • Replay MDL lock end A. update table cache in myrocks B. unlock MDL We have MDL lock to protect DDL operation in Primary. This lock also need in Replica’s DDL.
  13. 13. MORE THAN JUST CLOUD Cache replication ACL, Procedure, Query cache Replicaition Primary • Log cache change in RocksDB WAL ACL, Procedure, query cache Replica • Replay this change from WAL and invaild this cache
  14. 14. MORE THAN JUST CLOUD Index Statistics Replication Persistent • Part index statistics information persist in each SST • Total index statistics store in INDEX_STATISTICS Memory • Rdb_dey_def::m_stats Update • Analyze table • Flush memtable • Compact
 Log these update operations and replay in Replica
  15. 15. MORE THAN JUST CLOUD New Log Format log change for replication Log Types • DDL(START, END) • Cache change, ACL/Proc Log format • PUT/DELETE Log store location • __system__ column family
  16. 16. MORE THAN JUST CLOUD New Log Format New type in data dictionary // Data dictionary types enum DATA_DICT_TYPE { DDL_ENTRY_INDEX_START_NUMBER = 1, INDEX_INFO = 2, CF_DEFINITION = 3, BINLOG_INFO_INDEX_NUMBER = 4, DDL_DROP_INDEX_ONGOING = 5, INDEX_STATISTICS = 6, MAX_INDEX_ID = 7, DDL_CREATE_INDEX_ONGOING = 8, POLAR_LOG = 100, // for polar replication END_DICT_INDEX_ID = 255 }; enum POLAR_LOG_TYPE { TABLE_DDL = 1, CACHE_CHANGE = 2, …… END_POLAR_ROCK_TYPE = 255 };
  17. 17. MORE THAN JUST CLOUD New Log Format New type in data dictionary DDL_START • type: PUT • key: POLAR_LOG+TABLE_DDL+dbname.tablename • value: NULL DDL_END • type: DELETE • key: POLAR_LOG+TABLE_DDL+dbname.tablename • value: NULL CACHE_CHANGE • type: PUT • key: POLAR_LOG+CACHE_CHANGE+ACL/Proc • value: NULL
  18. 18. MORE THAN JUST CLOUD New Log Format Problems DDL_START • type: PUT • key: POLAR_LOG+TABLE_DDL+dbname.tablename • value: NULL DDL_END • type: DELETE • key: POLAR_LOG+TABLE_DDL+dbname.tablename • value: NULL 
 DDL_START and DDL_END must be a pair. Problem 1: Primary Crash • Primary crash after DDL_START, Primary will resent DDL_START when restart, and the previous DDL_END will lost. • Replica replay DDL_START and hold MDL lock, It will not unlock with DDL_END Problem 2: Replica Crash • Replica carsh after DDL_START, Replica will continue to replay DDL_END when restart • But the lock with DDL_START will not exist after restart, Replica replay DDL_END to unlock a MDL lock which is not exist 

  19. 19. MORE THAN JUST CLOUD New Log Format Solutions DDL_START and DDL_END must be a pair. Primary Crash • Primary crash after DDL_START, Primary will resent another DDL_START when restart, and the privious DDL_END will lost. • Replica replay DDL_START and hold MDL lock, It will not unlock with DDL_END Replica Crash • Replica carsh after DDL_START, Replica will continue to replay DDL_END when restart • But the lock with DDL_START will not exist after restart, Replica replay DDL_END to unlock a MDL lock which is not exist 
 Primary Crash • Primary Scan RocksDB to find record TABLE_DDL when restart, if found, Primary should resent DDL_END, and Replica will unlock the old lock Replica Crash • Replica Scan RocksDB to find record TABLE_DDL when restart, if found, Replica should replay DDL_START to lock 
 SolutionsProblems
  20. 20. MORE THAN JUST CLOUD MVCC MVCC based on RocksDB snapshot Control compact in Primary • Compact in Primary should consider about Replica’s snapshot • Only delete record when sequnce >=Sn, Sn is the min snapshot seqence in Replica Control flush in Replica • After flush memtable, The Replica snapshot data may lost in SST by Primary compact • Only flush when memtable’s min sequnce >=Sn, Sn is the min snapshot seqence in Replica Keep a consistent snapshot in Replica
  21. 21. MORE THAN JUST CLOUD Future Feature • Online DDL • HA Performance • Multi-write WAL • Asynchronous commit
  22. 22. MORE THAN JUST CLOUD THANK YOU

×