Be the first to like this
Citus is a distributed database that scales out Postgres. By using the extension APIs, Citus distributes your tables across a cluster of machines and parallelizes SQL queires. This talk describes the Citus architecture by focusing on our learnings in distributed systems. We first describe how Citus leverages PostgreSQL's extension APIs. These APIs are rich enough to store distributed metadata, add new commands to Postgres to help with sharding, parallelize and execute queries in a distributed cluster, and handle automatic failover of machines. Second, we show the architecture of a distributed query planner. We first describe the join order planner and describe how it chooses between broadcast, co-located, and repartition joins to minimize network I/O. We then show how we map SQL queries into distributed relational algebra, and optimize these plans for parallel execution. Third, we note a primary challenge in distributed systems. No single executor works great for all workloads. We show how Citus chooses between three executors, each one optimized for a different workload: NoSQL, operational analytics, and data warehousing. We then conclude with a demo that shows Citus running on a large cluster.