Successfully reported this slideshow.

Inside PostgreSQL Shared Memory

3,359 views

Published on

This presentation is for people who want to understand how PostgreSQL shares information among processes using shared memory. Topics covered include the internal data page format, usage of the shared buffers, locking methods, and various other shared memory data structures.

Published in: Technology, Education
  • Be the first to comment

Inside PostgreSQL Shared Memory

  1. 1. Inside PostgreSQL Shared Memory BRUCE MOMJIAN, ENTERPRISEDB January, 2009 Abstract POSTGRESQL is an open-source, full-featured relational database. This presentation gives an overview of the shared memory structures used by Postgres. Creative Commons Attribution License http://momjian.us/presentations
  2. 2. Outline 1. File storage format 2. Shared memory creation 3. Shared buffers 4. Row value access 5. Locking 6. Other structures Inside PostgreSQL Shared Memory 1
  3. 3. File System /data Postgres /data Postgres Postgres Inside PostgreSQL Shared Memory 2
  4. 4. File System /data/base Postgres /data /base /global Postgres /pg_clog /pg_multixact /pg_subtrans Postgres /pg_tblspc /pg_twophase /pg_xlog Inside PostgreSQL Shared Memory 3
  5. 5. File System /data/base/db Postgres /data /base /16385 (production) /1 (template1) Postgres /16821 (test) /17982 (devel) /21452 (marketing) Postgres Inside PostgreSQL Shared Memory 4
  6. 6. File System /data/base/db/table Postgres /data /base /16385 /24692 (customer) /27214 (order) Postgres /25932 (product) /25952 (employee) /27839 (part) Postgres Inside PostgreSQL Shared Memory 5
  7. 7. File System Data Pages Postgres /data /base /16385 /24692 8k 8k 8k 8k Postgres Postgres Inside PostgreSQL Shared Memory 6
  8. 8. Data Pages Postgres /data /base /16385 /24692 8k 8k 8k 8k Postgres Postgres Page Header Item Item Item 8K Tuple Tuple Tuple Special Inside PostgreSQL Shared Memory 7
  9. 9. File System Block Tuple Postgres /data /base /16385 /24692 8k 8k 8k 8k Postgres Page Header Item Item Item Postgres 8K Tuple Tuple Tuple Special Tuple Inside PostgreSQL Shared Memory 8
  10. 10. File System Tuple int4in(’9241’) ’Martin’ Tuple textout() Header Value Value Value Value Value Value OID − object id of tuple (optional) xmin − creation transaction id xmax − destruction transaction id cmin − creation command id cmax − destruction command id ctid − tuple id (page / item) natts − number of attributes infomask − tuple flags hoff − length of tuple header bits − bit map representing NULLs Inside PostgreSQL Shared Memory 9
  11. 11. Tuple Header C Structures typedef struct HeapTupleFields { TransactionId t_xmin; /* inserting xact ID */ TransactionId t_xmax; /* deleting or locking xact ID */ union { CommandId t_cid; /* inserting or deleting command ID, or both */ TransactionId t_xvac; /* VACUUM FULL xact ID */ } t_field3; } HeapTupleFields; typedef struct HeapTupleHeaderData { union { HeapTupleFields t_heap; DatumTupleFields t_datum; } t_choice; ItemPointerData t_ctid; /* current TID of this or newer tuple */ /* Fields below here must match MinimalTupleData! */ uint16 t_infomask2; /* number of attributes + various flags */ uint16 t_infomask; /* various flag bits, see below */ uint8 t_hoff; /* sizeof header incl. bitmap, padding */ /* ^ − 23 bytes − ^ */ bits8 t_bits[1]; /* bitmap of NULLs −− VARIABLE LENGTH */ /* MORE DATA FOLLOWS AT END OF STRUCT */ } HeapTupleHeaderData; Inside PostgreSQL Shared Memory 10
  12. 12. Shared Memory Creation k() for postmaster postgres postgres Program (Text) Program (Text) Program (Text) Data Data Data Shared Memory Shared Memory Shared Memory Stack Stack Stack Inside PostgreSQL Shared Memory 11
  13. 13. Shared Memory PROC Lightweight Locks XLOG Buffers Proc Array Lock Hashes CLOG Buffers LOCK Subtrans Buffers Auto Vacuum PROCLOCK Two−Phase Structs Btree Vacuum Multi−XACT Buffers Free Space Map Statistics Background Writer Synchronized Scan Shared Invalidation Buffer Descriptors Shared Buffers Semaphores Inside PostgreSQL Shared Memory 12
  14. 14. Shared Buffers Buffer Descriptors Pin Count − prevent page replacement LWLock − for page changes 8k 8k 8k Shared Buffers read() Page Header Item Item Item write() Postgres /data /base /16385 /24692 8K 8k 8k 8k 8k Tuple Postgres Tuple Tuple Special Postgres Inside PostgreSQL Shared Memory 13
  15. 15. HeapTuples 8k 8k 8k Shared Buffers Page Header Item Item Item 8K Tuple Tuple Tuple Special HeapTuple int4in(’9241’) ’Martin’ Tuple textout() Header Value Value Value Value Value Value Postgres C pointer OID − object id of tuple (optional) xmin − creation transaction id xmax − destruction transaction id cmin − creation command id cmax − destruction command id ctid − tuple id (page / item) natts − number of attributes infomask − tuple flags hoff − length of tuple header bits − bit map representing NULLs Inside PostgreSQL Shared Memory 14
  16. 16. Finding A Tuple Value in C Datum nocachegetattr(HeapTuple tuple, int attnum, TupleDesc tupleDesc, bool *isnull) { HeapTupleHeader tup = tuple−>t_data; Form_pg_attribute *att = tupleDesc−>attrs; { int i; /* * Note − This loop is a little tricky. For each non−null attribute, * we have to first account for alignment padding before the attr, * then advance over the attr based on its length. Nulls have no * storage and no alignment padding either. We can use/set * attcacheoff until we reach either a null or a var−width attribute. */ off = 0; for (i = 0;; i++) /* loop exit is at "break" */ { if (HeapTupleHasNulls(tuple) && att_isnull(i, bp)) continue; /* this cannot be the target att */ if (att[i]−>attlen == −1) off = att_align_pointer(off, att[i]−>attalign, −1, tp + off); else /* not varlena, so safe to use att_align_nominal */ off = att_align_nominal(off, att[i]−>attalign); if (i == attnum) break; off = att_addlength_pointer(off, att[i]−>attlen, tp + off); } } return fetchatt(att[attnum], tp + off); } Inside PostgreSQL Shared Memory 15
  17. 17. Value Access in C #define fetch_att(T,attbyval,attlen) ( (attbyval) ? ( (attlen) == (int) sizeof(int32) ? Int32GetDatum(*((int32 *)(T))) : ( (attlen) == (int) sizeof(int16) ? Int16GetDatum(*((int16 *)(T))) : ( AssertMacro((attlen) == 1), CharGetDatum(*((char *)(T))) ) ) ) : PointerGetDatum((char *) (T)) ) Inside PostgreSQL Shared Memory 16
  18. 18. Test And Set Lock Can Succeed Or Fail 1 1 0/1 0 1 Success Failure Was 0 on exchange Was 1 on exchange Lock already taken Inside PostgreSQL Shared Memory 17
  19. 19. Test And Set Lock x86 Assembler static __inline__ int tas(volatile slock_t *lock) { register slock_t _res = 1; /* * Use a non−locking test before asserting the bus lock. Note that the * extra test appears to be a small loss on some x86 platforms and a small * win on others; it’s by no means clear that we should keep it. */ __asm__ __volatile__( " cmpb $0,%1 n" " jne 1f n" " lock n" " xchgb %0,%1 n" "1: n" : "+q"(_res), "+m"(*lock) : : "memory", "cc"); return (int) _res; } Inside PostgreSQL Shared Memory 18
  20. 20. Spin Lock Always Succeeds 1 1 0/1 Sleep of increasing duration 0 1 Success Failure Was 0 on exchange Was 1 on exchange Lock already taken Spinlocks are designed for short-lived locking operations, like access to control structures. They are not be used to protect code that makes kernel calls or other heavy operations. Inside PostgreSQL Shared Memory 19
  21. 21. Light Weight Locks Sleep On Lock PROC Lightweight Locks XLOG Buffers Proc Array Lock Hashes CLOG Buffers LOCK Subtrans Buffers Auto Vacuum PROCLOCK Two−Phase Structs Btree Vacuum Multi−XACT Buffers Free Space Map Statistics Background Writer Synchronized Scan Shared Invalidation Buffer Descriptors Shared Buffers Semaphores Light weight locks attempt to acquire the lock, and go to sleep on a semaphore if the lock request fails. Spinlocks control access to the light weight lock control structure. Inside PostgreSQL Shared Memory 20
  22. 22. Database Object Locks PROC PROCLOCK LOCK Lock Hashes Inside PostgreSQL Shared Memory 21
  23. 23. Proc PROC empty used used empty used empty Proc Array Inside PostgreSQL Shared Memory 22
  24. 24. Other Shared Memory Structures PROC Lightweight Locks XLOG Buffers Proc Array Lock Hashes CLOG Buffers LOCK Subtrans Buffers Auto Vacuum PROCLOCK Two−Phase Structs Btree Vacuum Multi−XACT Buffers Free Space Map Statistics Background Writer Synchronized Scan Shared Invalidation Buffer Descriptors Shared Buffers Semaphores Inside PostgreSQL Shared Memory 23
  25. 25. Conclusion Pink Floyd: Wish You Were Here Inside PostgreSQL Shared Memory 24

×