Advertisement

More Related Content

Slideshows for you(20)

Advertisement

More from PostgresOpen(18)

Advertisement

Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres Open

  1. Inside PostgreSQL Shared Memory BRUCE MOMJIAN September, 2013 POSTGRESQL is an open-source, full-featured relational database. This presentation gives an overview of the shared memory structures used by Postgres. Creative Commons Attribution License http://momjian.us/presentations Inside PostgreSQL Shared Memory 1 / 25
  2. Outline 1. File storage format 2. Shared memory creation 3. Shared buffers 4. Row value access 5. Locking 6. Other structures Inside PostgreSQL Shared Memory 2 / 25
  3. File System /data Postgres Postgres Postgres /data Inside PostgreSQL Shared Memory 3 / 25
  4. File System /data/base Postgres Postgres Postgres /data /pg_clog /pg_multixact /pg_subtrans /pg_tblspc /pg_xlog /global /pg_twophase /base Inside PostgreSQL Shared Memory 4 / 25
  5. File System /data/base/db Postgres Postgres Postgres /data /base /16385 (production) /1 (template1) /17982 (devel) /16821 (test) /21452 (marketing) Inside PostgreSQL Shared Memory 5 / 25
  6. File System /data/base/db/table Postgres Postgres Postgres /data /base /16385 /24692 (customer) /27214 (order) /25932 (product) /25952 (employee) /27839 (part) Inside PostgreSQL Shared Memory 6 / 25
  7. File System Data Pages Postgres Postgres Postgres 8k 8k 8k 8k /data /base /16385 /24692 Inside PostgreSQL Shared Memory 7 / 25
  8. Data Pages Postgres Postgres Postgres Page Header Item Item Item Tuple Tuple Tuple Special 8K 8k 8k 8k 8k /data /base /16385 /24692 Inside PostgreSQL Shared Memory 8 / 25
  9. File System Block Tuple Postgres Postgres Postgres Page Header Item Item Item Tuple Tuple Tuple Special 8K 8k 8k 8k 8k /data /base /16385 /24692 Tuple Inside PostgreSQL Shared Memory 9 / 25
  10. File System Tuple hoff − length of tuple header infomask − tuple flags natts − number of attributes ctid − tuple id (page / item) cmax − destruction command id xmin − creation transaction id xmax − destruction transaction id cmin − creation command id bits − bit map representing NULLs OID − object id of tuple (optional) Tuple Value Value ValueValue Value Value ValueHeader int4in(’9241’) textout() ’Martin’ Inside PostgreSQL Shared Memory 10 / 25
  11. Tuple Header C Structures typedef struct HeapTupleFields { TransactionId t_xmin; /* inserting xact ID */ TransactionId t_xmax; /* deleting or locking xact ID */ union { CommandId t_cid; /* inserting or deleting command ID, or both */ TransactionId t_xvac; /* VACUUM FULL xact ID */ } t_field3; } HeapTupleFields; typedef struct HeapTupleHeaderData { union { HeapTupleFields t_heap; DatumTupleFields t_datum; } t_choice; ItemPointerData t_ctid; /* current TID of this or newer tuple */ /* Fields below here must match MinimalTupleData! */ uint16 t_infomask2; /* number of attributes + various flags */ uint16 t_infomask; /* various flag bits, see below */ uint8 t_hoff; /* sizeof header incl. bitmap, padding */ /* ^ − 23 bytes − ^ */ bits8 t_bits[1]; /* bitmap of NULLs −− VARIABLE LENGTH */ /* MORE DATA FOLLOWS AT END OF STRUCT */ } HeapTupleHeaderData; Inside PostgreSQL Shared Memory 11 / 25
  12. Shared Memory Creation postmaster postgres postgres Program (Text) Data Program (Text) Data Shared Memory Program (Text) Data Shared Memory Shared Memory Stack Stack fork() Stack Inside PostgreSQL Shared Memory 12 / 25
  13. Shared Memory Shared Buffers Proc Array PROC Multi−XACT Buffers Two−Phase Structs Subtrans Buffers CLOG Buffers XLOG Buffers Shared Invalidation Lightweight Locks Lock Hashes Auto Vacuum Btree Vacuum Buffer Descriptors Background Writer Synchronized Scan Semaphores Statistics LOCK PROCLOCK Inside PostgreSQL Shared Memory 13 / 25
  14. Shared Buffers Page Header Item Item Item Tuple Tuple Tuple Special 8K Postgres Postgres Postgres 8k 8k 8k 8k /data /base /16385 /24692 Shared Buffers LWLock − for page changes Pin Count − prevent page replacement read() write() 8k 8k 8k Buffer Descriptors Inside PostgreSQL Shared Memory 14 / 25
  15. HeapTuples Shared Buffers Postgres HeapTuple C pointer hoff − length of tuple header infomask − tuple flags natts − number of attributes ctid − tuple id (page / item) cmax − destruction command id xmin − creation transaction id xmax − destruction transaction id cmin − creation command id bits − bit map representing NULLs OID − object id of tuple (optional) Tuple Value Value ValueValue Value Value ValueHeader int4in(’9241’) textout() ’Martin’ Page Header Item Item Item Tuple Tuple Tuple Special 8K 8k 8k 8k Inside PostgreSQL Shared Memory 15 / 25
  16. Finding A Tuple Value in C Datum nocachegetattr(HeapTuple tuple, int attnum, TupleDesc tupleDesc, bool *isnull) { HeapTupleHeader tup = tuple−>t_data; Form_pg_attribute *att = tupleDesc−>attrs; { int i; /* * Note − This loop is a little tricky. For each non−null attribute, * we have to first account for alignment padding before the attr, * then advance over the attr based on its length. Nulls have no * storage and no alignment padding either. We can use/set * attcacheoff until we reach either a null or a var−width attribute. */ off = 0; for (i = 0;; i++) /* loop exit is at "break" */ { if (HeapTupleHasNulls(tuple) && att_isnull(i, bp)) continue; /* this cannot be the target att */ if (att[i]−>attlen == −1) off = att_align_pointer(off, att[i]−>attalign, −1, tp + off); else /* not varlena, so safe to use att_align_nominal */ off = att_align_nominal(off, att[i]−>attalign); if (i == attnum) break; off = att_addlength_pointer(off, att[i]−>attlen, tp + off); } } return fetchatt(att[attnum], tp + off); } Inside PostgreSQL Shared Memory 16 / 25
  17. Value Access in C #define fetch_att(T,attbyval,attlen) ( (attbyval) ? ( (attlen) == (int) sizeof(int32) ? Int32GetDatum(*((int32 *)(T))) : ( (attlen) == (int) sizeof(int16) ? Int16GetDatum(*((int16 *)(T))) : ( AssertMacro((attlen) == 1), CharGetDatum(*((char *)(T))) ) ) ) : PointerGetDatum((char *) (T)) ) Inside PostgreSQL Shared Memory 17 / 25
  18. Test And Set Lock Can Succeed Or Fail 0/1 1 0 Success Was 0 on exchange Lock already taken Was 1 on exchange Failure 1 1 Inside PostgreSQL Shared Memory 18 / 25
  19. Test And Set Lock x86 Assembler static __inline__ int tas(volatile slock_t *lock) { register slock_t _res = 1; /* * Use a non−locking test before asserting the bus lock. Note that the * extra test appears to be a small loss on some x86 platforms and a small * win on others; it’s by no means clear that we should keep it. */ __asm__ __volatile__( " cmpb $0,%1 n" " jne 1f n" " lock n" " xchgb %0,%1 n" "1: n" : "+q"(_res), "+m"(*lock) : : "memory", "cc"); return (int) _res; } Inside PostgreSQL Shared Memory 19 / 25
  20. Spin Lock Always Succeeds 0/1 1 Success Was 0 on exchange Failure Was 1 on exchange Lock already taken Sleep of increasing duration 0 1 1 Spinlocks are designed for short-lived locking operations, like access to control structures. They are not be used to protect code that makes kernel calls or other heavy operations. Inside PostgreSQL Shared Memory 20 / 25
  21. Light Weight Locks Shared Buffers Proc Array PROC Multi−XACT Buffers Two−Phase Structs Subtrans Buffers CLOG Buffers XLOG Buffers Shared Invalidation Lightweight Locks Auto Vacuum Btree Vacuum Background Writer Synchronized Scan Semaphores Statistics LOCK PROCLOCK Lock Hashes Buffer Descriptors Sleep On Lock Light weight locks attempt to acquire the lock, and go to sleep on a semaphore if the lock request fails. Spinlocks control access to the light weight lock control structure. Inside PostgreSQL Shared Memory 21 / 25
  22. Database Object Locks PROCLOCKPROC LOCK Lock Hashes Inside PostgreSQL Shared Memory 22 / 25
  23. Proc Proc Array used usedusedempty empty empty PROC Inside PostgreSQL Shared Memory 23 / 25
  24. Other Shared Memory Structures Shared Buffers Proc Array PROC Multi−XACT Buffers Two−Phase Structs Subtrans Buffers CLOG Buffers XLOG Buffers Shared Invalidation Lightweight Locks Lock Hashes Auto Vacuum Btree Vacuum Buffer Descriptors Background Writer Synchronized Scan Semaphores Statistics LOCK PROCLOCK Inside PostgreSQL Shared Memory 24 / 25
  25. Conclusion http://momjian.us/presentations Pink Floyd: Wish You Were Here Inside PostgreSQL Shared Memory 25 / 25
Advertisement