Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

of

POLARDB: A database architecture for the cloud Slide 1 POLARDB: A database architecture for the cloud Slide 2 POLARDB: A database architecture for the cloud Slide 3 POLARDB: A database architecture for the cloud Slide 4 POLARDB: A database architecture for the cloud Slide 5 POLARDB: A database architecture for the cloud Slide 6 POLARDB: A database architecture for the cloud Slide 7 POLARDB: A database architecture for the cloud Slide 8 POLARDB: A database architecture for the cloud Slide 9 POLARDB: A database architecture for the cloud Slide 10 POLARDB: A database architecture for the cloud Slide 11 POLARDB: A database architecture for the cloud Slide 12 POLARDB: A database architecture for the cloud Slide 13 POLARDB: A database architecture for the cloud Slide 14 POLARDB: A database architecture for the cloud Slide 15 POLARDB: A database architecture for the cloud Slide 16 POLARDB: A database architecture for the cloud Slide 17 POLARDB: A database architecture for the cloud Slide 18 POLARDB: A database architecture for the cloud Slide 19 POLARDB: A database architecture for the cloud Slide 20 POLARDB: A database architecture for the cloud Slide 21 POLARDB: A database architecture for the cloud Slide 22 POLARDB: A database architecture for the cloud Slide 23 POLARDB: A database architecture for the cloud Slide 24 POLARDB: A database architecture for the cloud Slide 25 POLARDB: A database architecture for the cloud Slide 26 POLARDB: A database architecture for the cloud Slide 27
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

1 Like

Share

Download to read offline

POLARDB: A database architecture for the cloud

Download to read offline

Presented at RootConf, Bangalore, June 22, 2019

POLARDB: A database architecture for the cloud

  1. 1. POLARDB: A database architecture for the cloud
  2. 2. ØYSTEIN GRØVLEN Sr. Staff Engineer @ Alibaba Cloud Bio: Before joining Alibaba, Øystein worked for 10 years in the MySQL optimizer team at Sun/Oracle. At Sun Microsystems, he was also a contributor on the Apache Derby project and Sun's Architectural Lead on Java DB. Prior to that, he worked for 10 years on development of Clustra, a highly available DBMS.
  3. 3. Databases inside Alibaba Group 1 Trillion USD 100M PB level 2018 Sales ($) Alibaba Singles' Day(11.11) 30.8B Cyber Monday 7.9B Amazon Prime Day 4.19B *Data source: forbes, cnbc, practicalecommerce and digitalcommerce360
  4. 4. Database Scalability Challenge in Alibaba Single’s Day Load: ~100x RT latency: unchanged Cloud shifts fixed CapEx expenses to variable OpEx expenses
  5. 5. 83% of Enterprise Workloads Will Be In The Cloud By 2020
  6. 6. Data Explosion • Data in Large Scale • Increased expense • Hard to utilize Generated by Human Generated by Things
  7. 7. Cloud Native Database — Requirements Scalable • Auto-scaling • Load • Storage Highly available • DBaaS • Security • AI • Serverless • Monitoring Integrate with
 Cloud
 Services • Data Redundancy • Automatic Failover • Zero DowntimeCLOUD NATIVE
  8. 8. POLARDB — Cloud Native Database Emerging Hardware • NVM • RDMA • FPGA Serverless • Auto Scaling • Paid by Usage • Zero Downtime Security • Encryption • Audit • Access Control Intelligence • Self-configuration • Self-optimization • Self-diagnosis • Self-healing CLOUD NATIVE User Oriented
  9. 9. Storage Revolution:PolarStore Transaction Architecture: Separation of Storage and Computation Database Storage Engine Computation OffloadingStorage Compatibility SecurityHTAPMulti-Model Usability Self-Driving Manageability
  10. 10. PolarStore: Architecture overview - Design for Emerging Hardware - Low Latency Oriented - Active R/W – Active RO - High Availability libpfs Host1 POLARDB libpfs POLARDB Host2 volume 1 Volume 2 chunk1 chunk2 chunk1 chunk2 PolarSwitch libpfs POLARDB volume 1 PolarSwitch chunk1 chunk2 ChunkServer ChunkServer ChunkServer ChunkServer chunk chunk chunk chunk ParallelRaft PolarCtrl metadata Key Components: 1. libpfs 2. PolarSwitch 3. ChunksServer 4. PolarCtrl data route control route
  11. 11. PolarStore: Design for Emerging Hardware - No Context Switch - OS-bypass & zero-copy RDMA-NIC Network Over RDMA libpfs POLARDB Memory - Parallel Random I/O absorbed by Optane - Excellent performance with less long tail latency issue - No need of Over Provisioning WAL Log in 3Dxpoint optane RDMA Network RDMA RDMA-NIC Optane NVMe SSDs Memory Chunkserver 1 RDMA-NIC Optane NVMe SSDs Memory Chunkserver 3 RDMA-NIC Optane NVMe SSDs Memory Chunkserver 2 PolarDB write to shm
  12. 12. PolarFS: posix distributed file system closely with DB Pure User Space For Extra-low Latency - No Sys call
 - No Context Switch
 - Zero Data Copy Posix Semantics - Easy Porting Node 1 libpfs POLARDB Journal file Paxos file Low Latency Oriented libpfs POLARDB libpfs POLARDB Node 2 Node 3 1 2 3 4 5 6 head pending tail tail POLARDB Cluster File System Metadata Cache Directory Tree File Mapping Table root FileBlk VolBlk 0 1 2 … 348 1500 0 201 … 6 Database Volume Chunks … Block Mapping Table FileID FileBlk 489 478 … 16 0 201 … VolBlk 200 201 202 0 2010 316 … 3 PolarFS: An Ultra-low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database (VLDB 2018)
  13. 13. Database Architecture Revolution: Separation of Storage and Computation Transaction Architecture: Separation of Storage and Computation Database Storage Engine Computation OffloadingStorage Compatibility SecurityHTAPMulti-Model Usability Self-Driving Manageability
  14. 14. Cloud Native Architecture • Scale compute and storage independently • Shared storage • Across AZ fail-over without data loss • Optimize division of functionality between storage and compute • Tight integration with other cloud components like metering, monitoring, control plan • Optimize for hardware in the data centers • Compatible with MySQL/PG etc • Security PolarProxy PolarStore POLARDB Intelligent proxy 100% Compatible Storage Optimized For Database PolarFS
  15. 15. Dynamic Scaling Local Storage Fast Scaling MySQL POLARDB Master Local Storage Replica Local Storage Replica Master Replica Replica Shared Storage Upgrade 2vCPU to 32vCPU, only in 5 minutes Add more Replicas, only in 5 minutes. 数值轴 1 Replica 2 Replica 3 Replica 4 Replica 5 Replica 10 Replica 20,949 11,349 9,749 8,149 6,549 4,949 39,844 20,102 16,811 13,521 10,230 6,940 RDS MySQL POLARDB Lower Cost: 30%~50% OFF Total costs of 4vCPU 32G Memory 500G Storage with different replica numbers 0 10000 20000 30000 40000
  16. 16. Shared Nothing Logical Replication vs. Shared Storage Physical Replication Local Storage Local Storage Master POLARSTORE Slave Master Slave Data Binlog Redo log Data Master Binlog Slave Binlog Redo log Data Redo log Data Redo log Binlog Physical Replication is much more reliable than Logical Replication
  17. 17. Shared Nothing Logical Replication vs. Shared Storage Physical Replication Non-blocking low-latency DDL synchronization Master Slave Timeline Add Column Running 1 Hour Add Column Blocked 1 Hour Applying DDL will block following events Add Column Update data files Update metadata Need not modify data files MySQL POLARDB Shared Storage Master Slave
  18. 18. Physical Replication by Redo Log Commit Async Flush Data File Redo Log DATA LOG & MEMORY Primary Shared Storage Log Parse Hash Table Redo Buffer Pool Buffer Pool Write Memory Query Snapshot of T4 T2 T4 T5 T1 T3 T3T2T1 T4 T5 T3T2T1 T4 T3T2T1 T4 T5 RO Node T4 Transactions Buffer Pool Shared Storage Continuous Recovery Consistent Snapshot Read T1
  19. 19. Physical Replication — Page from Past Oldest read view Control purge Avoid Data Gap Checkpoint LSN (T1) Primary Shared Storage Log Parse Hash Table Redo Buffer Pool Snapshot of T4 T2 T4 T5 T1 T3 T3T2T1 T4 T5 T3T2T1 T4 T1 T4Buffer Pool Data Redo Log Checkpoint T1 T3T2T1 T4 Purgeable Unpurgeable RO Node Primary RO Node
  20. 20. Physical Replication — Page from Future Avoid Data Overstep Control flush datafile Primary Shared Storage Log Parse Hash Table Redo Buffer Pool Snapshot of T4 T2 T4 T5 T1 T3 T3T2T1 T4 T5 T3T2T1 T4 T1 T4Buffer Pool Data Redo Log Snapshot Version T4 Unflushable T5 T3T2T1 T4 Flushable T4T3T2Primary Snapshot Version T4 LSN of the latest applied redolog RO Node RO Node
  21. 21. Single Master Single Endpoint Transparent Failover Attacks Protection Causal Consist Read Proxy Cluster Master Replica Replica Shared Storage Application Replica Read/Write Split High Availability Load Balance Security
  22. 22. Read and Write Separation — Session Consistent Problem Can’t read latest data Solved! connection.query { UPDATE user SET name=‘Jimmy’ WHERE id=1; COMMIT; SELECT name FROM user WHERE id=1; // name is Jimmy } SELECT can always get the latest data POLARDB Cluster LSN 30 LSN 35 1. UPDATE 2. SELECT Log Serial Number LSN 35 1. UPDATE 3. SELECT Require LSN>=35) 2. Return LSN=35 M R1 R2 Application Smart Proxy Read & Write Separation Load Balance Module
  23. 23. Multi-master Storage Monitoring Segment Servers DB Servers DB ServersDB Servers AZ1 AZ2 AZ3 segment 1 Segment Servers Segment Servers CS1 CS2 CS3 CS4 CS5 CS6 Database Cluster (Compute) Storage Service 3-AZ Persistent Core (append-only) TCPTCP Storage Cluster Page read RDMA RDMA RDMA redo segment 1 segment 1
  24. 24. HTAP — Hybrid Transaction and Analytical Processing OLTP Read-Only OLTP Read-Only Shared Storage OLTP Applications OLAP Read/Write Split Load Balance OLTP Read-Write OLAP Applications Smart Proxy
  25. 25. HTAP — Parallel Query
 Reduce Latency of Complex Queries 1024 512 256 128 64 32 16 8 4 2 DBT3 Query 6 Linear Scalability 1 2 4 8 16 32 tpch40 ideal_tpch40 tpch20 ideal_tpch20 tpch10 ideal_tpch10 tpch5 ideal_tpch5 One Query Multiple Workers on Server Parallel Scan on Storage Engine. Workers Storage Engine
  26. 26. POLARDB — Database for the Cloud • Separation of Storage and Compute • Independent scaling • Lower cost • Shared Storage • High throughput • Low latency • High availability • Fast scaling (no data copy) • Physical replication • Less I/O • Non-blocking DDL • Efficient parallel redo on slaves • Parallel Query Execution • Lower latency for complex queries
  27. 27. Thank You
  • rainman1985

    Jul. 5, 2019

Presented at RootConf, Bangalore, June 22, 2019

Views

Total views

595

On Slideshare

0

From embeds

0

Number of embeds

11

Actions

Downloads

30

Shares

0

Comments

0

Likes

1

×