Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Citus	
  5.0	
  
Extending	
  PostgreSQL	
  to	
  Build	
  a	
  	
  
Distributed	
  Database	
  
Ozgun	
  Erdogan	
  
on	
...
Talk	
  Outline	
  
1.  IntroducEon	
  
2.  Citus	
  5.0	
  and	
  its	
  use	
  of	
  extension	
  APIs	
  
3.  Distribut...
What	
  is	
  Citus?	
  
•  Citus	
  extends	
  PostgreSQL	
  (not	
  a	
  fork)	
  to	
  provide	
  
it	
  with	
  distri...
Citus	
  5.0	
  Architecture	
  Diagram	
  
Events	
  
Citus	
  worker	
  1	
  
(PostgreSQL	
  +	
  
Citus	
  extension)	
...
When	
  is	
  Citus	
  a	
  good	
  fit?	
  
•  Scaling	
  a	
  mulE-­‐tenant	
  (B2B)	
  database	
  to	
  100K+	
  tenant...
SQL,	
  Scaling	
  out,	
  and	
  What’s	
  
Unique	
  About	
  PostgreSQL?	
  
“SQL	
  doesn’t	
  Scale”	
  
1.  Scaling-­‐out	
  is	
  hard.	
  Scaling	
  data,	
  compared	
  to	
  
scaling	
  comput...
Query	
  Languages:	
  An	
  Example	
  
SQL	
  RouEng	
  /	
  ReplicaEon	
  
•  Simple	
  INSERT	
  rouEng	
  and	
  replicaEon	
  
1.  Parse	
  plain	
  text	
  ...
Takeaway	
  
	
  
When	
  you’re	
  scaling	
  out	
  a	
  SQL	
  query,	
  your	
  
“query	
  distribuEon”	
  logic	
  ne...
How	
  to	
  overcome	
  this?	
  
1.  ApplicaEon	
  level	
  sharding	
  
2.  Build	
  a	
  distributed	
  database	
  fr...
PostgreSQL	
  Extension	
  APIs	
  
•  CREATE	
  EXTENSION	
  citus;	
  
•  Metadata	
  stored	
  in	
  Postgres	
  tables...
Citus	
  Planner	
  Example	
  
Citus	
  
Summary	
  
•  PostgreSQL’s	
  extensible	
  architecture	
  puts	
  it	
  
in	
  a	
  unique	
  place	
  to	
  scale	
  o...
Why	
  is	
  distributed	
  query	
  
planning	
  (SELECTs)	
  hard?	
  	
  	
  
Past	
  Experiences	
  
•  Built	
  a	
  similar	
  distributed	
  data	
  processing	
  engine	
  at	
  
Amazon	
  called...
Why	
  did	
  it	
  fail?	
  
•  You	
  can	
  solve	
  all	
  distributed	
  systems	
  
problems	
  in	
  one	
  of	
  t...
Bringing	
  data	
  to	
  computaEon	
  (1)	
  
Bringing	
  computaEon	
  to	
  data	
  (2)	
  
Slightly	
  more	
  complex	
  queries	
  
•  Sum(price):	
  sum(price)	
  on	
  worker	
  nodes	
  and	
  
then	
  sum()	...
CommutaEve	
  ComputaEons	
  
•  If	
  you	
  can	
  transform	
  your	
  computaEons	
  into	
  
their	
  commutaEve	
  f...
How	
  does	
  this	
  help	
  me?	
  
•  CommutaEve,	
  associaEve,	
  and	
  distribuEve	
  
properEes	
  hold	
  for	
 ...
Simple	
  SQL	
  query	
  
Distributed	
  Logical	
  Plan	
  (unopEmized)	
  
Distributed	
  Logical	
  Plan	
  (opEmized)	
  
Takeaway	
  
	
  
In	
  the	
  land	
  of	
  distributed	
  systems,	
  the	
  
commutaEve	
  (and	
  distribuEve)	
  prop...
One	
  example	
  doesn’t	
  make	
  a	
  proof	
  
•  Can	
  you	
  prove	
  this	
  model	
  is	
  complete?	
  
•  Rela...
MulE-­‐RelaEonal	
  Algebra	
  
•  Correctness	
  of	
  Query	
  ExecuEon	
  Strategies	
  in	
  
Distributed	
  Databases...
CommutaEve	
  Property	
  Rules	
  
DistribuEve	
  Property	
  Rules	
  
FactorizaEon	
  Rules	
  
Two	
  important	
  notes	
  (1)	
  
Logical	
  plan	
  ≠	
  Physical	
  plan	
  
•  “Join”	
  is	
  a	
  logical	
  opera...
Two	
  important	
  notes	
  (2)	
  
MulE-­‐relaEonal	
  Algebra	
  offers	
  a	
  complete	
  
foundaEon	
  for	
  distrib...
Summary	
  
•  To	
  scale	
  out,	
  you	
  need	
  to	
  transform	
  your	
  
computaEons	
  into	
  their	
  commutaEv...
Distributed	
  Query	
  ExecuEon	
  
across	
  Different	
  Workloads	
  
Different	
  Workloads	
  
1.  Simple	
  Insert	
  /	
  Update	
  /	
  Delete	
  /	
  Select	
  commands	
  
•  High	
  thr...
Different	
  Executors	
  
1.  Router	
  Executor:	
  Simple	
  Insert	
  /	
  Update	
  /	
  Delete	
  /	
  
Select	
  com...
Conclusions	
  
•  Distributed	
  relaEonal	
  databases	
  is	
  hard	
  
•  PostgreSQL	
  and	
  its	
  extension	
  API...
QuesEons	
  
hVps://citusdata.com	
  
Forums:	
  groups.google.com/forum/#!forum/
citus-­‐users	
  
Upcoming SlideShare
Loading in …5
×

Citus Architecture: Extending Postgres to Build a Distributed Database

765 views

Published on

Citus is a distributed database that scales out Postgres. By using the extension APIs, Citus distributes your tables across a cluster of machines and parallelizes SQL queires. This talk describes the Citus architecture by focusing on our learnings in distributed systems. We first describe how Citus leverages PostgreSQL's extension APIs. These APIs are rich enough to store distributed metadata, add new commands to Postgres to help with sharding, parallelize and execute queries in a distributed cluster, and handle automatic failover of machines. Second, we show the architecture of a distributed query planner. We first describe the join order planner and describe how it chooses between broadcast, co-located, and repartition joins to minimize network I/O. We then show how we map SQL queries into distributed relational algebra, and optimize these plans for parallel execution. Third, we note a primary challenge in distributed systems. No single executor works great for all workloads. We show how Citus chooses between three executors, each one optimized for a different workload: NoSQL, operational analytics, and data warehousing. We then conclude with a demo that shows Citus running on a large cluster.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Citus Architecture: Extending Postgres to Build a Distributed Database

  1. 1. Citus  5.0   Extending  PostgreSQL  to  Build  a     Distributed  Database   Ozgun  Erdogan   on  behalf  of  Citus  Data  team  
  2. 2. Talk  Outline   1.  IntroducEon   2.  Citus  5.0  and  its  use  of  extension  APIs   3.  Distributed  query  planning   4.  Different  distributed  executors  for  different   workloads   •  Three  technical  lightning  talks  in  one  
  3. 3. What  is  Citus?   •  Citus  extends  PostgreSQL  (not  a  fork)  to  provide   it  with  distributed  funcEonality.   •  Citus  scales-­‐out  Postgres  across  servers  using   sharding  and  replicaEon.  Its  query  engine   parallelizes  SQL  queries  across  many  servers.   •  Citus  5.0  is  open  source:  hVps://github.com/ citusdata/citus  
  4. 4. Citus  5.0  Architecture  Diagram   Events   Citus  worker  1   (PostgreSQL  +   Citus  extension)   …   …   …   …   Citus  coordinator   (PostgreSQL  +   Citus  extension)     Distributed  table   (metadata)   E1   E3’   Citus  worker  2   …   …   …   …   E2   E1’   Citus  worker  N   …   …   …   …   E3   E2’   …   Regular  tables   (1  shard  =     1  Postgres  table)  
  5. 5. When  is  Citus  a  good  fit?   •  Scaling  a  mulE-­‐tenant  (B2B)  database  to  100K+  tenants   •  Sub-­‐second  OLAP  queries  on  data  as  it  arrives   •  Powering  real-­‐Eme  analyEc  dashboards   •  Exploratory  queries  on  events  as  they  arrive   •  Who  is  using  Citus?   •  CloudFlare  uses  Citus  to  power  their  analyEc  dashboards   •  Neustar  builds  ad-­‐tech  infrastructure  with  HyperLogLog   •  Heap  powers  funnel,  segmentaEon,  and  cohort  queries  
  6. 6. SQL,  Scaling  out,  and  What’s   Unique  About  PostgreSQL?  
  7. 7. “SQL  doesn’t  Scale”   1.  Scaling-­‐out  is  hard.  Scaling  data,  compared  to   scaling  computaEons,  is  even  harder.   2.  SQL  means  different  things  to  different  people:   transacEonal  workloads,  short  reads/writes,  real-­‐ Eme  analyEcs,  data  warehousing,  or  triggers.   3.  SQL  doesn’t  have  the  no1on  of  “distribu1on”  built   into  the  language.  This  can  be  added  in,  but  not   there  in  SQL.  
  8. 8. Query  Languages:  An  Example  
  9. 9. SQL  RouEng  /  ReplicaEon   •  Simple  INSERT  rouEng  and  replicaEon   1.  Parse  plain  text  SQL  query   2.  Check  column  values  and  types  against  table  schema   3.  Apply  opEmizaEons,  such  as  constant  folding   4.  Determine  “billgates”  is  the  distribuEon  key   5.  Only  then  can  you  route  and  replicate  INSERT   •  What  about  my  SELECT  queries?  
  10. 10. Takeaway     When  you’re  scaling  out  a  SQL  query,  your   “query  distribuEon”  logic  needs  to  work   together  with  the  part  that  understands  the   query.  
  11. 11. How  to  overcome  this?   1.  ApplicaEon  level  sharding   2.  Build  a  distributed  database  from  scratch   3.  Extend  on  core  for  agreed  upon  use-­‐case   •  MulE-­‐master  for  replicaEon  and  HA;  parEEoning   •  Build  middleware  for  open  source  database   4.  Fork  an  open  source  database    
  12. 12. PostgreSQL  Extension  APIs   •  CREATE  EXTENSION  citus;   •  Metadata  stored  in  Postgres  tables   •  User-­‐defined  funcEons  to  extend  SQL  syntax   •  Hooks:  Planner,  executor,  and  uElity  hooks   •  Similar  to  interceptors  in  Java  frameworks  
  13. 13. Citus  Planner  Example   Citus  
  14. 14. Summary   •  PostgreSQL’s  extensible  architecture  puts  it   in  a  unique  place  to  scale  out  SQL  and  also   adapt  to  evolving  hardware  trends.   •  It  could  just  be  that  the  monolithic  SQL   database  is  dying.  If  so,  long  live  Postgres!  
  15. 15. Why  is  distributed  query   planning  (SELECTs)  hard?      
  16. 16. Past  Experiences   •  Built  a  similar  distributed  data  processing  engine  at   Amazon  called  CSPIT   •  Led  by  a  visionary  architect  and  built  by  an   extremely  talented  team   •  Scaled  to  (at  best)  a  dozen  machines.  Nicely   distributed  basic  computaEons  across  machines   •  Then  the  dream  met  reality  
  17. 17. Why  did  it  fail?   •  You  can  solve  all  distributed  systems   problems  in  one  of  two  days:   1.  Bring  your  data  to  the  computaEon   2.  Push  your  computaEon  to  the  data  
  18. 18. Bringing  data  to  computaEon  (1)  
  19. 19. Bringing  computaEon  to  data  (2)  
  20. 20. Slightly  more  complex  queries   •  Sum(price):  sum(price)  on  worker  nodes  and   then  sum()  intermediate  results   •  Avg(price):  Can  you  avg(price)  on  worker   nodes  and  then  avg()  intermediate  results?   •  Why  not?  
  21. 21. CommutaEve  ComputaEons   •  If  you  can  transform  your  computaEons  into   their  commutaEve  form,  then  you  can  push   them  down.   •  (a  +  b  =  b  +  a  ;  a  /  b  ≠  b  /  a)    (*)   •  AssociaEve  and  distribuEve  property  for  other   operaEons  (We  also  knew  about  this)  
  22. 22. How  does  this  help  me?   •  CommutaEve,  associaEve,  and  distribuEve   properEes  hold  for  any  query  language   •  We  pick  SQL  as  an  example  language   •  SQL  uses  RelaEonal  Algebra  to  express  a  query   •  If  a  query  has  a  WHERE  clause  in  it,  that’s  a   FILTER  node  in  the  relaEonal  algebra  tree  
  23. 23. Simple  SQL  query  
  24. 24. Distributed  Logical  Plan  (unopEmized)  
  25. 25. Distributed  Logical  Plan  (opEmized)  
  26. 26. Takeaway     In  the  land  of  distributed  systems,  the   commutaEve  (and  distribuEve)  property  is  king!   Transform  your  queries  with  respect  to  the  king,   and  they  will  scale!  
  27. 27. One  example  doesn’t  make  a  proof   •  Can  you  prove  this  model  is  complete?   •  RelaEonal  Algebra  has  10  operators   •  What  about  opEmizing  more  complex   plans  with  joins,  subselects,  and  other   constructs?  
  28. 28. MulE-­‐RelaEonal  Algebra   •  Correctness  of  Query  ExecuEon  Strategies  in   Distributed  Databases  Ceri  and  Pelagao,  1983   •  A  Distributed  Database  paper  from  a  more   civilized  age   •  Models  each  relaEonal  algebra  operator  as  a   distributed  operator  and  extends  it  
  29. 29. CommutaEve  Property  Rules  
  30. 30. DistribuEve  Property  Rules  
  31. 31. FactorizaEon  Rules  
  32. 32. Two  important  notes  (1)   Logical  plan  ≠  Physical  plan   •  “Join”  is  a  logical  operator.  HashJoin  or  MergeJoin  is  a   physical  operator.   •  It’s  easier  to  reason  about  logical  operators’   mathemaEcal  properEes  than  those  of  physical   operators.   •  Distributed  databases  that  start  from  a  “database”   usually  extend  physical  operators.  (Greenplum,   Redshis)    
  33. 33. Two  important  notes  (2)   MulE-­‐relaEonal  Algebra  offers  a  complete   foundaEon  for  distribuEng  SQL  queries.   •  Citus  is  adding  more  SQL  funcEonality  with  each   release.   •  From  a  use-­‐case  standpoint,  think  of  Citus  not  as   a  replacement  to  your  data  warehouse,  and   instead  as  extending  it  with  real-­‐Eme  capabiliEes.  
  34. 34. Summary   •  To  scale  out,  you  need  to  transform  your   computaEons  into  their  commutaEve  and   distribuEve  form.   •  Correctness  of  Query  ExecuEon  Strategies  in   Distributed  Databases  (1983)  offers  a   framework  to  do  this  for  relaEonal  algebra.  
  35. 35. Distributed  Query  ExecuEon   across  Different  Workloads  
  36. 36. Different  Workloads   1.  Simple  Insert  /  Update  /  Delete  /  Select  commands   •  High  throughput  and  low  latency   2.  Real-­‐Eme  Select  queries  that  get  parallelized  to  hundreds  of   shards  (<300ms)   3.  Long  running  Select  queries  that  join  large  tables   •  You  can’t  restart  a  Select  query  just  because  one  task  (or  one   machine)  in  1M  tasks  failed      
  37. 37. Different  Executors   1.  Router  Executor:  Simple  Insert  /  Update  /  Delete  /   Select  commands   2.  Real-­‐Eme  Executor:  Real-­‐Eme  Select  queries  that   touch  100s  of  shards  (<300ms)   3.  Task-­‐tracker  Executor:  Longer  running  queries  that   need  to  scale  out  to  10K-­‐1M  tasks      
  38. 38. Conclusions   •  Distributed  relaEonal  databases  is  hard   •  PostgreSQL  and  its  extension  APIs  are  unique   •  Citus  targets  real-­‐Eme  data  ingest  and   querying   •  Citus  5.0  is  open  source:  hVps://github.com/ citusdata/citus  
  39. 39. QuesEons   hVps://citusdata.com   Forums:  groups.google.com/forum/#!forum/ citus-­‐users  

×