SlideShare a Scribd company logo
FRITS HOOGLAND
LINUX
MEMORY
EXPLAINED
Copyright © 2017 Accenture All rights reserved. | 2
ENVIRONMENT
Virtual machine: VirtualBox 5.1.28 r117968
Operating system: Oracle Linux 7.3
Kernel: 4.1.12-94.5.9.el7uek.x86_64
Oracle database: 12.2.0.1.170814
Amount of memory: 4GB
Copyright © 2017 Accenture All rights reserved. | 3
BOOT
An intel based system first starts it’s firmware, which is either BIOS or UEFI.
The firmware tries to boot. When it boots from disk:
• BIOS: scans disks for a MBR, which loads a boot loader: grub / grub2.
• UEFI: scans disk for GUID partition table, read system partition, which executes boot
loader: grub2.
Grub(2) loads the linux kernel into memory:
# dmesg | grep Command
[ 0.000000] Command line: BOOT_IMAGE=/
vmlinuz-4.1.12-94.3.4.el7uek.x86_64 root=/dev/mapper/ol-root ro
crashkernel=auto rd.lvm.lv=ol/root rd.lvm.lv=ol/swap rhgb quiet numa=off
transparent_hugepage=never
Copyright © 2017 Accenture All rights reserved. | 4
BOOT
Of course this uses some memory:
# dmesg | grep Memory
[ 0.000000] Memory: 3743704K/4193848K available (7417K kernel code,
1565K rwdata, 3812K rodata, 1888K init, 3228K bss, 433760K reserved,
16384K cma-reserved)
This means the kernel left approximately 3.6G of the 4.0G of PHYSICAL memory for use
after loading.
This can be verified by looking at /proc/meminfo:
# head -1 /proc/meminfo
MemTotal: 3781460 kB
Copyright © 2017 Accenture All rights reserved. | 5
VIRTUAL MEMORY
Linux is an operating system that provides virtual memory.
• This means the addresses the processes see, are virtual memory addresses which are
unique to that process.
• This is called an ‘address space’.
• All memory allocations are only visible to the process that allocated these.
• The translation from virtual memory addresses to physical addresses is done by hardware
in the CPU, using what is called the ‘memory management unit’ or MMU.
• This is why specialistic memory management, like hugepages, is not only dependent
on operating system support, but also on hardware support.
• The management of physical memory is done by the operating system.
• This is why swapping occurs by an operating system process.
Copyright © 2017 Accenture All rights reserved. | 6
ADDRESS SPACE
A linux 64 bit process address space consists of the full 2^64 addresses:
# cat /proc/self/maps
00400000-0040b000 r-xp 00000000 f9:00 67398280 /usr/bin/cat
0060b000-0060c000 r--p 0000b000 f9:00 67398280 /usr/bin/cat
0060c000-0060d000 rw-p 0000c000 f9:00 67398280 /usr/bin/cat
023c4000-023e5000 rw-p 00000000 00:00 0 [heap]
…lot's of other output…
7ffc4880e000-7ffc4882f000 rw-p 00000000 00:00 0 [stack]
7ffc48840000-7ffc48842000 r--p 00000000 00:00 0 [vvar]
7ffc48842000-7ffc48844000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
# echo "obase=16;2^64-1" | bc
FFFFFFFFFFFFFFFF
That’s 16 exa bytes! For every process!
Copyright © 2017 Accenture All rights reserved. | 7
PHYSICAL MEMORY LIMITATIONS
Please mind the 16 exabytes is virtual memory!
Physical memory is limited by multiple factors.
First factor is physical implementation of memory addressing on the CPU:
# grep address size /proc/cpuinfo
address sizes : 46 bits physical, 48 bits virtual
address sizes : 46 bits physical, 48 bits virtual
address sizes : 46 bits physical, 48 bits virtual
address sizes : 46 bits physical, 48 bits virtual
This is saying the linux kernel detects a CPU with 46 bits for addressing memory.
# echo "2^46" | bc
70368744177664
That’s 64 terabyte. Per CPU socket.
Copyright © 2017 Accenture All rights reserved. | 8
PHYSICAL MEMORY LIMITATIONS
64 terabyte is a lot. In a (typical) 2 socket setup, that is 128 terabyte of memory potentially
addressable.
The second limitation is the actual major limitation on physical memory.
That is the on-die memory controller supported physical memory types and configuration,
and the physical hardware motherboard setup.
Copyright © 2017 Accenture All rights reserved. | 9
PHYSICAL MEMORY LIMITATIONS
This is the DIMM and processor physical layout from the X6-2 service manual:
It has got 12 DIMM slots per socket, and the manual describes it supports DIMMs up to 64G
in size. This means that the maximal amount of physical memory is: 64*12*2=1536G = 1.5T!
Copyright © 2017 Accenture All rights reserved. | 10
MEMORY: PHYSICAL AND VIRTUAL
Okay. So far we got:
Every linux process has its own address space of 16 exabytes of addressable virtual
memory.
A typical current Intel Xeon CPU has 46 bits physically available for memory addressing,
which means every CPU can theoretically address 64T.
A system is limited in physical memory by memory module size, number and populating
rules dictated by the on die memory manager(s) (type of memory, number of channels and
number of slots per channel)*, and the physical memory motherboard configuration.
On an Oracle X6-2 system this is maximally 1.5T. Same for X7-2.
Copyright © 2017 Accenture All rights reserved. | 11
EXECUTABLE MEMORY USAGE
What happens when an executable is executed in a shell on linux?
Common thinking is:
• The process is cloned.
• Loads the executable.
• Executes it.
• Returns the return code to the parent process.
Memory needed = executable size.
Which very roughly is what happens, but there’s a lot more to it than that!
…and memory size that is needed is commonly NOT executable size!
Copyright © 2017 Accenture All rights reserved. | 12
EXECUTABLE MEMORY USAGE
A linux ELF executable is not entirely loaded into memory and then executed.
For the ELF executable itself, some memory regions are allocated:
# less /proc/self/maps
00400000-00421000 r-xp 00000000 f9:00 67316594 /usr/bin/less
00620000-00621000 r--p 00020000 f9:00 67316594 /usr/bin/less
00621000-00625000 rw-p 00021000 f9:00 67316594 /usr/bin/less
00625000-0062a000 rw-p 00000000 00:00 0
0255d000-0257e000 rw-p 00000000 00:00 0 [heap]
7f5149f59000-7f5150482000 r--p 00000000 f9:00 587351 /usr/lib/locale/locale-archive
7f5150482000-7f5150639000 r-xp 00000000 f9:00 134334316 /usr/lib64/libc-2.17.so
7f5150639000-7f5150838000 ---p 001b7000 f9:00 134334316 /usr/lib64/libc-2.17.so
…etc…
Copyright © 2017 Accenture All rights reserved. | 13
EXECUTABLE MEMORY USAGE
00400000-00421000 r-xp 00000000 f9:00 67316594 /usr/bin/less
Code segment. Contains code to execute, is obviously readonly and executable.
00620000-00621000 r--p 00020000 f9:00 67316594 /usr/bin/less
Rodata segment. Contains readonly variables, obviously is readonly too.
00621000-00625000 rw-p 00021000 f9:00 67316594 /usr/bin/less
Data segment. Contains variables, obviously NOT readonly.
00625000-0062a000 rw-p 00000000 00:00 0
BSS segment. Contains statically allocated variables that are zero-valued initially.
These are possible segments.
There can be more and others.
Code and data segments seem to be always used.
Copyright © 2017 Accenture All rights reserved. | 14
EXECUTABLE MEMORY USAGE
00400000-00421000 r-xp 00000000 f9:00 67316594 /usr/bin/less
00620000-00621000 r--p 00020000 f9:00 67316594 /usr/bin/less
00621000-00625000 rw-p 00021000 f9:00 67316594 /usr/bin/less
00625000-0062a000 rw-p 00000000 00:00 0
421000 - 400000 = 21000
621000 - 620000 = 1000
625000 - 621000 = 4000 +
0x26000 = 155648 = 152K
# du -sh $(which less)
156K /bin/less
Copyright © 2017 Accenture All rights reserved. | 15
EXECUTABLE MEMORY USAGE
A way to look at the total size of a process is looking at the VSZ/VSS or virtual set size:
# ps -o pid,vsz,cmd -C less
PID VSZ CMD
19477 110256 less /proc/self/maps
(VSZ is in KB)
However, this tells the virtual set size is 110MB!
That is very much different from 152K of the ‘less’ executable?!
Copyright © 2017 Accenture All rights reserved. | 16
EXECUTABLE MEMORY USAGE
Let’s execute less /proc/self/maps again:
00400000-00421000 r-xp 00000000 f9:00 67316594 /usr/bin/less
00620000-00621000 r--p 00020000 f9:00 67316594 /usr/bin/less
00621000-00625000 rw-p 00021000 f9:00 67316594 /usr/bin/less
00625000-0062a000 rw-p 00000000 00:00 0
0255d000-0257e000 rw-p 00000000 00:00 0 [heap]
7f5149f59000-7f5150482000 r--p 00000000 f9:00 587351 /usr/lib/locale/locale-archive
7f5150482000-7f5150639000 r-xp 00000000 f9:00 134334316 /usr/lib64/libc-2.17.so
7f5150639000-7f5150838000 ---p 001b7000 f9:00 134334316 /usr/lib64/libc-2.17.so
7f5150838000-7f515083c000 r--p 001b6000 f9:00 134334316 /usr/lib64/libc-2.17.so
7f515083c000-7f515083e000 rw-p 001ba000 f9:00 134334316 /usr/lib64/libc-2.17.so
7f515083e000-7f5150843000 rw-p 00000000 00:00 0
7f5150843000-7f5150868000 r-xp 00000000 f9:00 134567523 /usr/lib64/libtinfo.so.5.9
7f5150868000-7f5150a67000 ---p 00025000 f9:00 134567523 /usr/lib64/libtinfo.so.5.9
7f5150a67000-7f5150a6b000 r--p 00024000 f9:00 134567523 /usr/lib64/libtinfo.so.5.9
7f5150a6b000-7f5150a6c000 rw-p 00028000 f9:00 134567523 /usr/lib64/libtinfo.so.5.9
…etc…
There’s more stuff visible in the address space!
Copyright © 2017 Accenture All rights reserved. | 17
EXECUTABLE MEMORY USAGE
The ‘less’ executable is a dynamic linked executable.
This can be seen by using the ‘ldd’ executable: shared library (loader) dependencies:
# ldd $(which less)
linux-vdso.so.1 => (0x00007ffecc5bd000)
libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00007fe0db553000)
libc.so.6 => /lib64/libc.so.6 (0x00007fe0db192000)
/lib64/ld-linux-x86-64.so.2 (0x000055fde3a3b000)
This means the less executable requires four libraries to run!
• linux-vdso.so.1 Meta-library, injected by the kernel to solve slow sys calls.
• libtinfo.so.5 Terminal info library.
• libc.so.6 General c library.
• ld-linux-x86-64.so.2 Program interpreter for loading libraries.
Copyright © 2017 Accenture All rights reserved. | 18
EXECUTABLE MEMORY USAGE
Okay that’s:
/bin/less 156K
/lib64/libtinfo.so.5.9 168K
/lib64/libc-2.17.so 2.1 M
/lib64/ld-2.17.so 156K +
2.580M versus VSZ: 110’256K !!
Turns out there is the following entry in maps:
7f5149f59000-7f5150482000 r--p 00000000 f9:00 587351 /usr/lib/locale/locale-archive
# du -sh /usr/lib/locale/locale-archive
102M /usr/lib/locale/locale-archive
Ah! 2.5M (exec+libs) + 102M (locale-archive) + some other bits in maps is approx. 110’245K!
Copyright © 2017 Accenture All rights reserved. | 19
VIRTUAL SET SIZE
That is VSZ, virtual set size, explained:
• The total size of all memory allocations in an address space.
Now that you know about virtual set size: forget about it.
It’s a useless statistic for determining memory usage, it says nothing… at all!
Copyright © 2017 Accenture All rights reserved. | 20
VIRTUAL SET SIZE
• Lazy loading and allocation.
• Only the bare minimum of a file is read into the address space.
• Page fault
• Accessing a page of a memory allocation that hasn’t been read before in the address
space causes the MMU to trigger an interrupt which calls the kernel ‘page fault’
routine.
• ‘fault’ means the page is not yet available.
• Causing this page to be read into the process’ address space.
• Commonly referred to as ‘page in’.
There is a statistic that shows the amount of pages actually available in the address space.
This is what is called ‘resident set size’ or RSS.
Copyright © 2017 Accenture All rights reserved. | 21
RESIDENT SET SIZE
Let’s look at the less command with ‘ps’ including the RSS column:
# ps -o pid,vsz,rss,cmd -C less
PID VSZ RSS CMD
21908 110256 1940 less /proc/self/maps
That is 1940K versus 110256K !!!!!
Do you see how much work has been prevented by using lazy loading???
Actually, you can see how much is resident per memory allocation by looking at the ‘smaps’
file in proc.
Copyright © 2017 Accenture All rights reserved. | 22
RESIDENT SET SIZE
# less /proc/21908/smaps
…
7fc57c21c000-7fc582745000 r--p 00000000 f9:00 587351 /usr/lib/locale/locale-archive
Size: 103588 kB
Rss: 436 kB
Pss: 48 kB
Shared_Clean: 436 kB
Shared_Dirty: 0 kB
…
Only a tiny portion of /usr/lib/locale/locale-archive is actually used (Size versus Rss).
Copyright © 2017 Accenture All rights reserved. | 23
RESIDENT SET SIZE
So, now we got virtual set size (VSZ or sometimes VSS) that is the full size of files and
memory allocations, which only very seldom gets truly allocated. It is not a useful statistic
for determining memory usage.
For being able to understand what is truly used in an address space we got resident set size
(RSS). Does RSS show the actual memory used by a process?
No.
Linux has another trick up its sleeve.
Any page that is in the address space of a process, is shared with all of its child processes.
By using a technique called ‘copy on write’ (COW), changes to a page make that page
unique for that process.
Copyright © 2017 Accenture All rights reserved. | 24
PROPORTIONAL SET SIZE
So, we apparently got this fantastic page sharing mechanism.
But is there any way to see how and if this is being used?
Yes. Linux keeps a statistic in ‘smaps’ that is called ‘Pss’: proportional set size.
The PSS value is calculated in the following way:
• The sum of the size of every resident page,
• For which the size of every page is divided by the number of processes it is shared with.
Copyright © 2017 Accenture All rights reserved. | 25
PROPORTIONAL SET SIZE
Let’s look at the ‘smaps’ file for the less command again, at the ‘locale-archive’ allocation:
# less /proc/21908/smaps
…
7fc57c21c000-7fc582745000 r--p 00000000 f9:00 587351 /usr/lib/locale/locale-archive
Size: 103588 kB
Rss: 436 kB
Pss: 48 kB
Shared_Clean: 436 kB
Shared_Dirty: 0 kB
…
The full size is 103588K, of which only 436K is resident in the process space, which is
shared with other processes, because proportional only 48K is used!
Copyright © 2017 Accenture All rights reserved. | 26
ORACLE MEMORY USAGE
Of course linux applies these techniques to running Oracle too:
# grep -e - -e ^Size -e ^Rss -e ^Pss /proc/$(pidof ora_vktm_test)/smaps
00400000-13ac8000 r-xp 00000000 f9:02 214249537 /u01/app/12.2.0.1/grid/bin/oracle
Size: 318240 kB
Rss: 43192 kB
Pss: 1916 kB
13cc7000-13ead000 r--p 136c7000 f9:02 214249537 /u01/app/12.2.0.1/grid/bin/oracle
Size: 1944 kB
Rss: 884 kB
Pss: 37 kB
13ead000-13ef3000 rw-p 138ad000 f9:02 214249537 /u01/app/12.2.0.1/grid/bin/oracle
Size: 280 kB
Rss: 24 kB
Pss: 24 kB
…etc…
Copyright © 2017 Accenture All rights reserved. | 27
LINUX SHARED MEMORY
How about shared memory?
There is no ‘owner’ of shared memory, it is used by every process it is shared with.
Every processes will page in whatever it actually needs:
System V shared memory:
60c00000-96800000 rw-s 00000000 00:05 163844 /SYSV00000000 (deleted)
Size: 880640 kB
Rss: 884 kB
Pss: 124 kB
But does shared memory need to exist BEFORE processes can use it?
Copyright © 2017 Accenture All rights reserved. | 28
LINUX SHARED MEMORY
No:
# ipcs -mu
------ Shared Memory Status --------
segments allocated 4
pages allocated 256005
pages resident 83055
pages swapped 0
Swap performance: 0 attempts 0 successes
The same lazy allocation mechanism applies to shared memory too.
(pages allocated versus pages resident)
Copyright © 2017 Accenture All rights reserved. | 29
ORACLE SHARED MEMORY PAGING
Actually, there are a few settings that influences the way Oracle uses shared memory.
_touch_sga_pages_during_allocation
During startup the bequeathing process touches SGA pages, which allocates them.
Default value: false.
(Setting it to true doesn’t seem to do anything on my test systems)
pre_page_sga
In the nomount phase a process (sa00) is spawned by the database that touches SGA
pages, and vanishes after it’s done.
Default value: false*.
This means that by default Oracle allocates SGA in a lazy manner too!
Copyright © 2017 Accenture All rights reserved. | 30
HUGE / LARGE PAGES
Hugepages is a joint operating system and hardware (MMU) feature to increase the memory
administration unit (‘page’) size.
Regular page size is 4K.
2MB page size support is dependent on cpu flag ‘pse’:
# grep pse /proc/cpuinfo
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc pni
pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm
1GB page size support is dependent on cpu flag ‘pdpe1gb’:
# grep pdpe1gb /proc/cpuinfo
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon
pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2
ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm arat epb xsaveopt pln pts dts
tpr_shadow vnmi flexpriority ept vpid
Copyright © 2017 Accenture All rights reserved. | 31
HUGE / LARGE PAGES
Why use hugepages?
• Page addresses are cached in a memory area that is called ‘transaction lookaside buffer’
or TLB.
• Having fewer entries improves TLB cache hit ratio, and reduces CPU usage.
• Hugepages are non-swappable.
• Hugepages reduces per process page address administration size: Pagetable size.
Hugepages memory creation requires physical memory that has not been used by 4K pages
(‘fragmented’) previously.
Memory that is configured for hugepages, is not usable for regular 4K allocations ! ! ! !
Even if none of the hugepages are used by any process.
Copyright © 2017 Accenture All rights reserved. | 32
HUGE / LARGE PAGES
# grep ^HugePage /proc/meminfo
HugePages_Total: 1126
HugePages_Free: 784
HugePages_Rsvd: 159
HugePages_Surp: 0
Used 1126 (tot) - 784 (free) = 342 Used
Reserved = 159 Reserved
Real free 784 (free) - 159 (rsvd) = 625 Free +
1126 Total
Copyright © 2017 Accenture All rights reserved. | 33
HUGE / LARGE PAGES
The reporting for hugepages memory usage by ipcs is ‘kind of’ broken:
# ipcs -mu
------ Shared Memory Status --------
segments allocated 4
pages allocated 256517
pages resident 175105
256517 * 4K = 1002M.
In other words: ipcs -mu reports the hugepages as regular (4K) pages!
Copyright © 2017 Accenture All rights reserved. | 34
HUGE / LARGE PAGES
The reporting for hugepages memory allocations in /proc/PID/smaps is ‘kind of’ broken too:
# less /proc/$(pidof ora_pmon_test)/smaps
60400000-96400000 rw-s 00000000 00:0e 360449 /SYSV00000000 (deleted)
Size: 884736 kB
Rss: 0 kB
Pss: 0 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 0 kB
Anonymous: 0 kB
AnonHugePages: 0 kB
Swap: 0 kB
KernelPageSize: 2048 kB
MMUPageSize: 2048 kB
This is the variable SGA shared memory alloc. Rss says we allocated nothing. Not true.
Copyright © 2017 Accenture All rights reserved. | 35
ANONYMOUS ALLOCATIONS
Outside of the code, data and rodata allocations mapped from a file into memory, there’s
memory allocations NOT backed by a file.
00621000-00625000 rw-p 00021000 f9:00 67316594 /usr/bin/less
00625000-0062a000 rw-p 00000000 00:00 0
0255d000-0257e000 rw-p 00000000 00:00 0 [heap]
7f5149f59000-7f5150482000 r--p 00000000 f9:00 587351 /usr/lib/locale/locale-archive
7f5150482000-7f5150639000 r-xp 00000000 f9:00 134334316 /usr/lib64/libc-2.17.so
7f5150639000-7f5150838000 ---p 001b7000 f9:00 134334316 /usr/lib64/libc-2.17.so
7f5150838000-7f515083c000 r--p 001b6000 f9:00 134334316 /usr/lib64/libc-2.17.so
7f515083c000-7f515083e000 rw-p 001ba000 f9:00 134334316 /usr/lib64/libc-2.17.so
7f515083e000-7f5150843000 rw-p 00000000 00:00 0
7f5150843000-7f5150868000 r-xp 00000000 f9:00 134567523 /usr/lib64/libtinfo.so.5.9
The allocations above are BSS segments.
Memory explicitly allocated using malloc() shows up identical, also being anonymous
allocations.
The PGA of Oracle processes is allocated in anonymous allocations.
Copyright © 2017 Accenture All rights reserved. | 36
MEMORY USAGE CALCULATION
Is there a simple way to calculate actual, total, memory usage for Oracle on Linux?
Answer: no, not really.
VSZ: has nothing to do with actual usage.
RSS: describes memory allocation for a process, the pages are allocated, but pages can be
shared, so RSS will hugely oversubscribe. Unless RSS = PSS.
PSS: describes proportional memory usage, only if all processes PSS sizes are calculated
together, at the same time, it will tell actual allocation (this is VERY fluent information!).
For process private memory, PSS is the way to go. But for shared memory, you can’t use a
process centric method. Memory could be allocated by a process that quit.
Copyright © 2017 Accenture All rights reserved. | 37
MEMORY USAGE CALCULATION
The only way to realistically say something about linux memory usage, is using these statistics in /proc/
meminfo:
• MemFree: idle memory.
• KernelStack, Slab: kernel memory.
• Buffers: linux kernel buffers for IO.
• SwapCached: cache for swapped pages.
• Cached (- Shmem): linux page cache and memory mapped files.
• PageTables: virtual to physical memory addresses administration.
• Shmem: small pages shared memory allocation.
• AnonPages: all anonymous memory allocations.
• Hugepages_Total - Hugepages_Free: Used hugepages (number in pages)
• Hugepages_Rsvd: Logically allocated, not yet physically allocated hugepages (number in pages)
• Hugepages_Free-Hugepages_Rsvd: true free hugepages (number in pages)
Copyright © 2017 Accenture All rights reserved. | 38
DEMO: LIVE MEMORY USAGE
If you want to do the same yourself:
1. Install node_exporter (https://github.com/prometheus/node_exporter)
• This is the fetcher (‘exporter’) of the statistics.
2. Install prometheus (https://github.com/prometheus/prometheus)
• This is the persistence layer for the statistics.
3. Install grafana (http://grafana.com)
• This is the presentation/visualisation layer.
4. Setup prometheus to fetch data from node_exporter (https://resin.io/blog/monitoring-
linux-stats-with-prometheus-io/)
• node_exporter → prometheus.
5. Add prometheus as a grafana source (https://prometheus.io/docs/visualization/grafana/)
• prometheus → grafana
6. Import grafana dashboard id: 2747
(https://fritshoogland.wordpress.com/2017/07/31/installation-overview-of-node_exporter-
prometheus-and-grafana/)
Copyright © 2017 Accenture All rights reserved. | 39
Copyright © 2017 Accenture All rights reserved. | 40
Now let’s startup the Oracle database.
Important settings:
• sga_target = 2G
• pga_aggregate_target = 500M*
• filesystemio_options = setall
DEMO: LIVE MEMORY USAGE
Copyright © 2017 Accenture All rights reserved. | 41
Copyright © 2017 Accenture All rights reserved. | 42
Copyright © 2017 Accenture All rights reserved. | 43
Copyright © 2017 Accenture All rights reserved. | 44
Copyright © 2017 Accenture All rights reserved. | 45
Copyright © 2017 Accenture All rights reserved. | 46
Copyright © 2017 Accenture All rights reserved. | 47
I didn’t mention huge pages (yet)! Let’s see how that looks like:
DEMO: LIVE MEMORY USAGE
Copyright © 2017 Accenture All rights reserved. | 48
Copyright © 2017 Accenture All rights reserved. | 49
Copyright © 2017 Accenture All rights reserved. | 50
Copyright © 2017 Accenture All rights reserved. | 51
Copyright © 2017 Accenture All rights reserved. | 52
Copyright © 2017 Accenture All rights reserved. | 53
Copyright © 2017 Accenture All rights reserved. | 54
Copyright © 2017 Accenture All rights reserved. | 55
A computer system can use a limited amount of memory because of memory manager limitations and
physical slots.
The linux kernel needs some memory to run, which means MemTotal in /proc/meminfo will not give you
the total physical amount of memory.
An executable can be ‘dynamically linked’, which means it uses libraries for additional functionality. The
oracle database executable is an example of that.
CONCLUSION
Copyright © 2017 Accenture All rights reserved. | 56
The full size of an allocation is called virtual set size, which is not a useful statistic for determining
memory usage.
The amount of pages for any allocation in an address space that is truly directly available is called the
resident set size.
Any process shares its memory allocations with its child processes. Only once a page is modified, it is
made unique to a process.
The page sharing can be seen by looking at the resident set size and the proportional set size in /proc/
PID/smaps.
CONCLUSION
Copyright © 2017 Accenture All rights reserved. | 57
Huge pages memory can only be used for huge pages allocations. Even if there is a shortage of small
pages.
Shortage of (smallpages) memory results in pages being moved to the swap device.
Cached/Mapped and Anonymous pages are the first to be swapped.
Shared memory pages will be swapped when memory pressure persists.
Other/kernel related pages can be swapped too, but do not seem to get swapped too much.
PageTables: if these take a significant amount, consider hugepages.
CONCLUSION
Copyright © 2017 Accenture All rights reserved. | 58
The lazy loading can make memory shortage/swapping diagnosis hard.
Any memory allocation is done lazy.
The Oracle SGA is not fully physically allocated at startup (depending on settings). This means when
sized too big (especially in a consolidated environment) it might take some time before memory is truly
allocated and memory pressure becomes visible.
The Oracle PGA is not allocated at startup, but allocated per process on demand, and can free memory
too. PGA_AGGREGATE_TARGET is a target value for instance work area’s. PGA memory always is in
Anonymous memory allocations on linux.
PGA can easily allocate huge amounts of memory, for example with ‘collections’.
Having lots of sessions and lots of cursors can lead to a lot of Anonymous memory locked and wasted.
CONCLUSION
Copyright © 2017 Accenture All rights reserved. | 59
Because any memory operation is done lazy, moving swapped pages back into memory is also done lazy.
This means a past memory shortage can lead to swap being in use currently, with no actual memory
problem.
There only is a memory pressure/swapping problem if there is a persistent swapping out and swapping in
happening, or single way at a high rate, leading to processes getting stuck. Not because swap is filled for
X%
CONCLUSION
Copyright © 2017 Accenture All rights reserved. | 60
Q & A

More Related Content

Similar to linux-memory-explained.pdf

Hardware Discovery Commands
Hardware Discovery CommandsHardware Discovery Commands
Hardware Discovery Commands
Kevin OBrien
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Brendan Gregg
 
R&D work on pre exascale HPC systems
R&D work on pre exascale HPC systemsR&D work on pre exascale HPC systems
R&D work on pre exascale HPC systems
Joshua Mora
 
Raspberry Pi tutorial
Raspberry Pi tutorialRaspberry Pi tutorial
Raspberry Pi tutorial
艾鍗科技
 
Clear cache memory
Clear cache memoryClear cache memory
Clear cache memory
Abdullah Al Muzammi
 
Linux Capabilities - eng - v2.1.5, compact
Linux Capabilities - eng - v2.1.5, compactLinux Capabilities - eng - v2.1.5, compact
Linux Capabilities - eng - v2.1.5, compact
Alessandro Selli
 
Vortex86 Sx Linux How To
Vortex86 Sx Linux How ToVortex86 Sx Linux How To
Vortex86 Sx Linux How To
Rogelio Canedo
 
RamDisk
RamDiskRamDisk
9i hp relnotes
9i hp relnotes9i hp relnotes
9i hp relnotes
Anil Pandey
 
Analisis_avanzado_vmware
Analisis_avanzado_vmwareAnalisis_avanzado_vmware
Analisis_avanzado_vmware
virtualizacionTV
 
Advanced Root Cause Analysis
Advanced Root Cause AnalysisAdvanced Root Cause Analysis
Advanced Root Cause Analysis
Eric Sloof
 
SOFA Tutorial
SOFA TutorialSOFA Tutorial
SOFA Tutorial
NTU CSIE, Taiwan
 
Qemu - Raspberry | while42 Singapore #2
Qemu - Raspberry | while42 Singapore #2Qemu - Raspberry | while42 Singapore #2
Qemu - Raspberry | while42 Singapore #2
While42
 
101 2.1 design hard disk layout v2
101 2.1 design hard disk layout v2101 2.1 design hard disk layout v2
101 2.1 design hard disk layout v2
Acácio Oliveira
 
Tesla Hacking to FreedomEV
Tesla Hacking to FreedomEVTesla Hacking to FreedomEV
Tesla Hacking to FreedomEV
Jasper Nuyens
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
Brendan Gregg
 
Docker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in PragueDocker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in Prague
tomasbart
 
BeagleBone Black Booting Process
BeagleBone Black Booting ProcessBeagleBone Black Booting Process
BeagleBone Black Booting Process
SysPlay eLearning Academy for You
 
x86_64 Hardware Deep dive
x86_64 Hardware Deep divex86_64 Hardware Deep dive
x86_64 Hardware Deep dive
Naoto MATSUMOTO
 
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg Schad
Spark Summit
 

Similar to linux-memory-explained.pdf (20)

Hardware Discovery Commands
Hardware Discovery CommandsHardware Discovery Commands
Hardware Discovery Commands
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
 
R&D work on pre exascale HPC systems
R&D work on pre exascale HPC systemsR&D work on pre exascale HPC systems
R&D work on pre exascale HPC systems
 
Raspberry Pi tutorial
Raspberry Pi tutorialRaspberry Pi tutorial
Raspberry Pi tutorial
 
Clear cache memory
Clear cache memoryClear cache memory
Clear cache memory
 
Linux Capabilities - eng - v2.1.5, compact
Linux Capabilities - eng - v2.1.5, compactLinux Capabilities - eng - v2.1.5, compact
Linux Capabilities - eng - v2.1.5, compact
 
Vortex86 Sx Linux How To
Vortex86 Sx Linux How ToVortex86 Sx Linux How To
Vortex86 Sx Linux How To
 
RamDisk
RamDiskRamDisk
RamDisk
 
9i hp relnotes
9i hp relnotes9i hp relnotes
9i hp relnotes
 
Analisis_avanzado_vmware
Analisis_avanzado_vmwareAnalisis_avanzado_vmware
Analisis_avanzado_vmware
 
Advanced Root Cause Analysis
Advanced Root Cause AnalysisAdvanced Root Cause Analysis
Advanced Root Cause Analysis
 
SOFA Tutorial
SOFA TutorialSOFA Tutorial
SOFA Tutorial
 
Qemu - Raspberry | while42 Singapore #2
Qemu - Raspberry | while42 Singapore #2Qemu - Raspberry | while42 Singapore #2
Qemu - Raspberry | while42 Singapore #2
 
101 2.1 design hard disk layout v2
101 2.1 design hard disk layout v2101 2.1 design hard disk layout v2
101 2.1 design hard disk layout v2
 
Tesla Hacking to FreedomEV
Tesla Hacking to FreedomEVTesla Hacking to FreedomEV
Tesla Hacking to FreedomEV
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPFOSSNA 2017 Performance Analysis Superpowers with Linux BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
 
Docker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in PragueDocker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in Prague
 
BeagleBone Black Booting Process
BeagleBone Black Booting ProcessBeagleBone Black Booting Process
BeagleBone Black Booting Process
 
x86_64 Hardware Deep dive
x86_64 Hardware Deep divex86_64 Hardware Deep dive
x86_64 Hardware Deep dive
 
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg Schad
 

More from TricantinoLopezPerez

con9578-2088758.pdf
con9578-2088758.pdfcon9578-2088758.pdf
con9578-2088758.pdf
TricantinoLopezPerez
 
sqltuning101-170419021007-2.pdf
sqltuning101-170419021007-2.pdfsqltuning101-170419021007-2.pdf
sqltuning101-170419021007-2.pdf
TricantinoLopezPerez
 
Oracle Wait Events That Everyone Should Know.ppt
Oracle Wait Events That Everyone Should Know.pptOracle Wait Events That Everyone Should Know.ppt
Oracle Wait Events That Everyone Should Know.ppt
TricantinoLopezPerez
 
pdfslide.net_em12c-capacity-planning-with-oem-metrics_2.pdf
pdfslide.net_em12c-capacity-planning-with-oem-metrics_2.pdfpdfslide.net_em12c-capacity-planning-with-oem-metrics_2.pdf
pdfslide.net_em12c-capacity-planning-with-oem-metrics_2.pdf
TricantinoLopezPerez
 
exadata-database-machine-kpis-3224944.pdf
exadata-database-machine-kpis-3224944.pdfexadata-database-machine-kpis-3224944.pdf
exadata-database-machine-kpis-3224944.pdf
TricantinoLopezPerez
 
TA110_System_Capacity_Plan.doc
TA110_System_Capacity_Plan.docTA110_System_Capacity_Plan.doc
TA110_System_Capacity_Plan.doc
TricantinoLopezPerez
 
kscope2013vst-130627142814-phpapp02.pdf
kscope2013vst-130627142814-phpapp02.pdfkscope2013vst-130627142814-phpapp02.pdf
kscope2013vst-130627142814-phpapp02.pdf
TricantinoLopezPerez
 
TGorman Collab16 UnixTools 20160411.pdf
TGorman Collab16 UnixTools 20160411.pdfTGorman Collab16 UnixTools 20160411.pdf
TGorman Collab16 UnixTools 20160411.pdf
TricantinoLopezPerez
 
BrownbagIntrotosqltuning.ppt
BrownbagIntrotosqltuning.pptBrownbagIntrotosqltuning.ppt
BrownbagIntrotosqltuning.ppt
TricantinoLopezPerez
 
sqltuningcardinality1(1).ppt
sqltuningcardinality1(1).pptsqltuningcardinality1(1).ppt
sqltuningcardinality1(1).ppt
TricantinoLopezPerez
 
ash-1-130820122404-phpapp01.pdf
ash-1-130820122404-phpapp01.pdfash-1-130820122404-phpapp01.pdf
ash-1-130820122404-phpapp01.pdf
TricantinoLopezPerez
 

More from TricantinoLopezPerez (11)

con9578-2088758.pdf
con9578-2088758.pdfcon9578-2088758.pdf
con9578-2088758.pdf
 
sqltuning101-170419021007-2.pdf
sqltuning101-170419021007-2.pdfsqltuning101-170419021007-2.pdf
sqltuning101-170419021007-2.pdf
 
Oracle Wait Events That Everyone Should Know.ppt
Oracle Wait Events That Everyone Should Know.pptOracle Wait Events That Everyone Should Know.ppt
Oracle Wait Events That Everyone Should Know.ppt
 
pdfslide.net_em12c-capacity-planning-with-oem-metrics_2.pdf
pdfslide.net_em12c-capacity-planning-with-oem-metrics_2.pdfpdfslide.net_em12c-capacity-planning-with-oem-metrics_2.pdf
pdfslide.net_em12c-capacity-planning-with-oem-metrics_2.pdf
 
exadata-database-machine-kpis-3224944.pdf
exadata-database-machine-kpis-3224944.pdfexadata-database-machine-kpis-3224944.pdf
exadata-database-machine-kpis-3224944.pdf
 
TA110_System_Capacity_Plan.doc
TA110_System_Capacity_Plan.docTA110_System_Capacity_Plan.doc
TA110_System_Capacity_Plan.doc
 
kscope2013vst-130627142814-phpapp02.pdf
kscope2013vst-130627142814-phpapp02.pdfkscope2013vst-130627142814-phpapp02.pdf
kscope2013vst-130627142814-phpapp02.pdf
 
TGorman Collab16 UnixTools 20160411.pdf
TGorman Collab16 UnixTools 20160411.pdfTGorman Collab16 UnixTools 20160411.pdf
TGorman Collab16 UnixTools 20160411.pdf
 
BrownbagIntrotosqltuning.ppt
BrownbagIntrotosqltuning.pptBrownbagIntrotosqltuning.ppt
BrownbagIntrotosqltuning.ppt
 
sqltuningcardinality1(1).ppt
sqltuningcardinality1(1).pptsqltuningcardinality1(1).ppt
sqltuningcardinality1(1).ppt
 
ash-1-130820122404-phpapp01.pdf
ash-1-130820122404-phpapp01.pdfash-1-130820122404-phpapp01.pdf
ash-1-130820122404-phpapp01.pdf
 

Recently uploaded

在线办理(UOIT毕业证书)安大略省理工大学毕业证在读证明一模一样
在线办理(UOIT毕业证书)安大略省理工大学毕业证在读证明一模一样在线办理(UOIT毕业证书)安大略省理工大学毕业证在读证明一模一样
在线办理(UOIT毕业证书)安大略省理工大学毕业证在读证明一模一样
yhkox
 
一比一原版(EUR毕业证)鹿特丹伊拉斯姆斯大学毕业证如何办理
一比一原版(EUR毕业证)鹿特丹伊拉斯姆斯大学毕业证如何办理一比一原版(EUR毕业证)鹿特丹伊拉斯姆斯大学毕业证如何办理
一比一原版(EUR毕业证)鹿特丹伊拉斯姆斯大学毕业证如何办理
nguqayx
 
Community Skills Building Workshop | PMI Silver Spring Chapter | June 12, 2024
Community Skills Building Workshop | PMI Silver Spring Chapter  | June 12, 2024Community Skills Building Workshop | PMI Silver Spring Chapter  | June 12, 2024
Community Skills Building Workshop | PMI Silver Spring Chapter | June 12, 2024
Hector Del Castillo, CPM, CPMM
 
体育博彩论坛-十大体育博彩论坛-体育博彩论坛|【​网址​🎉ac55.net🎉​】
体育博彩论坛-十大体育博彩论坛-体育博彩论坛|【​网址​🎉ac55.net🎉​】体育博彩论坛-十大体育博彩论坛-体育博彩论坛|【​网址​🎉ac55.net🎉​】
体育博彩论坛-十大体育博彩论坛-体育博彩论坛|【​网址​🎉ac55.net🎉​】
waldorfnorma258
 
All Of My Java Codes With A Sample Output.docx
All Of My Java Codes With A Sample Output.docxAll Of My Java Codes With A Sample Output.docx
All Of My Java Codes With A Sample Output.docx
adhitya5119
 
Gabrielle M. A. Sinaga Portfolio, Film Student (2024)
Gabrielle M. A. Sinaga Portfolio, Film Student (2024)Gabrielle M. A. Sinaga Portfolio, Film Student (2024)
Gabrielle M. A. Sinaga Portfolio, Film Student (2024)
GabrielleSinaga
 
How to overcome obstacles in the way of success.pdf
How to overcome obstacles in the way of success.pdfHow to overcome obstacles in the way of success.pdf
How to overcome obstacles in the way of success.pdf
Million-$-Knowledge {Million Dollar Knowledge}
 
官方认证美国旧金山州立大学毕业证学位证书案例原版一模一样
官方认证美国旧金山州立大学毕业证学位证书案例原版一模一样官方认证美国旧金山州立大学毕业证学位证书案例原版一模一样
官方认证美国旧金山州立大学毕业证学位证书案例原版一模一样
2zjra9bn
 
Switching Careers Slides - JoyceMSullivan SocMediaFin - 2024Jun11.pdf
Switching Careers Slides - JoyceMSullivan SocMediaFin -  2024Jun11.pdfSwitching Careers Slides - JoyceMSullivan SocMediaFin -  2024Jun11.pdf
Switching Careers Slides - JoyceMSullivan SocMediaFin - 2024Jun11.pdf
SocMediaFin - Joyce Sullivan
 
A Guide to a Winning Interview June 2024
A Guide to a Winning Interview June 2024A Guide to a Winning Interview June 2024
A Guide to a Winning Interview June 2024
Bruce Bennett
 
Connect to Grow: The power of building networks
Connect to Grow: The power of building networksConnect to Grow: The power of building networks
Connect to Grow: The power of building networks
Eirini SYKA-LERIOTI
 
一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理
一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理
一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理
taqyea
 
按照学校原版(UofT文凭证书)多伦多大学毕业证快速办理
按照学校原版(UofT文凭证书)多伦多大学毕业证快速办理按照学校原版(UofT文凭证书)多伦多大学毕业证快速办理
按照学校原版(UofT文凭证书)多伦多大学毕业证快速办理
evnum
 
BUKU PENJAGAAN BUKU PENJAGAAN BUKU PENJAGAAN
BUKU PENJAGAAN BUKU PENJAGAAN BUKU PENJAGAANBUKU PENJAGAAN BUKU PENJAGAAN BUKU PENJAGAAN
BUKU PENJAGAAN BUKU PENJAGAAN BUKU PENJAGAAN
cahgading001
 
0624.speakingengagementsandteaching-01.pdf
0624.speakingengagementsandteaching-01.pdf0624.speakingengagementsandteaching-01.pdf
0624.speakingengagementsandteaching-01.pdf
Thomas GIRARD BDes
 
办理阿卡迪亚大学毕业证(uvic毕业证)本科文凭证书原版一模一样
办理阿卡迪亚大学毕业证(uvic毕业证)本科文凭证书原版一模一样办理阿卡迪亚大学毕业证(uvic毕业证)本科文凭证书原版一模一样
办理阿卡迪亚大学毕业证(uvic毕业证)本科文凭证书原版一模一样
kkkkr4pg
 
一比一原版(surrey毕业证书)英国萨里大学毕业证成绩单修改如何办理
一比一原版(surrey毕业证书)英国萨里大学毕业证成绩单修改如何办理一比一原版(surrey毕业证书)英国萨里大学毕业证成绩单修改如何办理
一比一原版(surrey毕业证书)英国萨里大学毕业证成绩单修改如何办理
gnokue
 
Learnings from Successful Jobs Searchers
Learnings from Successful Jobs SearchersLearnings from Successful Jobs Searchers
Learnings from Successful Jobs Searchers
Bruce Bennett
 
按照学校原版(ArtEZ文凭证书)ArtEZ艺术学院毕业证快速办理
按照学校原版(ArtEZ文凭证书)ArtEZ艺术学院毕业证快速办理按照学校原版(ArtEZ文凭证书)ArtEZ艺术学院毕业证快速办理
按照学校原版(ArtEZ文凭证书)ArtEZ艺术学院毕业证快速办理
evnum
 
一比一原版美国西北大学毕业证(NWU毕业证书)学历如何办理
一比一原版美国西北大学毕业证(NWU毕业证书)学历如何办理一比一原版美国西北大学毕业证(NWU毕业证书)学历如何办理
一比一原版美国西北大学毕业证(NWU毕业证书)学历如何办理
1wful2fm
 

Recently uploaded (20)

在线办理(UOIT毕业证书)安大略省理工大学毕业证在读证明一模一样
在线办理(UOIT毕业证书)安大略省理工大学毕业证在读证明一模一样在线办理(UOIT毕业证书)安大略省理工大学毕业证在读证明一模一样
在线办理(UOIT毕业证书)安大略省理工大学毕业证在读证明一模一样
 
一比一原版(EUR毕业证)鹿特丹伊拉斯姆斯大学毕业证如何办理
一比一原版(EUR毕业证)鹿特丹伊拉斯姆斯大学毕业证如何办理一比一原版(EUR毕业证)鹿特丹伊拉斯姆斯大学毕业证如何办理
一比一原版(EUR毕业证)鹿特丹伊拉斯姆斯大学毕业证如何办理
 
Community Skills Building Workshop | PMI Silver Spring Chapter | June 12, 2024
Community Skills Building Workshop | PMI Silver Spring Chapter  | June 12, 2024Community Skills Building Workshop | PMI Silver Spring Chapter  | June 12, 2024
Community Skills Building Workshop | PMI Silver Spring Chapter | June 12, 2024
 
体育博彩论坛-十大体育博彩论坛-体育博彩论坛|【​网址​🎉ac55.net🎉​】
体育博彩论坛-十大体育博彩论坛-体育博彩论坛|【​网址​🎉ac55.net🎉​】体育博彩论坛-十大体育博彩论坛-体育博彩论坛|【​网址​🎉ac55.net🎉​】
体育博彩论坛-十大体育博彩论坛-体育博彩论坛|【​网址​🎉ac55.net🎉​】
 
All Of My Java Codes With A Sample Output.docx
All Of My Java Codes With A Sample Output.docxAll Of My Java Codes With A Sample Output.docx
All Of My Java Codes With A Sample Output.docx
 
Gabrielle M. A. Sinaga Portfolio, Film Student (2024)
Gabrielle M. A. Sinaga Portfolio, Film Student (2024)Gabrielle M. A. Sinaga Portfolio, Film Student (2024)
Gabrielle M. A. Sinaga Portfolio, Film Student (2024)
 
How to overcome obstacles in the way of success.pdf
How to overcome obstacles in the way of success.pdfHow to overcome obstacles in the way of success.pdf
How to overcome obstacles in the way of success.pdf
 
官方认证美国旧金山州立大学毕业证学位证书案例原版一模一样
官方认证美国旧金山州立大学毕业证学位证书案例原版一模一样官方认证美国旧金山州立大学毕业证学位证书案例原版一模一样
官方认证美国旧金山州立大学毕业证学位证书案例原版一模一样
 
Switching Careers Slides - JoyceMSullivan SocMediaFin - 2024Jun11.pdf
Switching Careers Slides - JoyceMSullivan SocMediaFin -  2024Jun11.pdfSwitching Careers Slides - JoyceMSullivan SocMediaFin -  2024Jun11.pdf
Switching Careers Slides - JoyceMSullivan SocMediaFin - 2024Jun11.pdf
 
A Guide to a Winning Interview June 2024
A Guide to a Winning Interview June 2024A Guide to a Winning Interview June 2024
A Guide to a Winning Interview June 2024
 
Connect to Grow: The power of building networks
Connect to Grow: The power of building networksConnect to Grow: The power of building networks
Connect to Grow: The power of building networks
 
一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理
一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理
一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理
 
按照学校原版(UofT文凭证书)多伦多大学毕业证快速办理
按照学校原版(UofT文凭证书)多伦多大学毕业证快速办理按照学校原版(UofT文凭证书)多伦多大学毕业证快速办理
按照学校原版(UofT文凭证书)多伦多大学毕业证快速办理
 
BUKU PENJAGAAN BUKU PENJAGAAN BUKU PENJAGAAN
BUKU PENJAGAAN BUKU PENJAGAAN BUKU PENJAGAANBUKU PENJAGAAN BUKU PENJAGAAN BUKU PENJAGAAN
BUKU PENJAGAAN BUKU PENJAGAAN BUKU PENJAGAAN
 
0624.speakingengagementsandteaching-01.pdf
0624.speakingengagementsandteaching-01.pdf0624.speakingengagementsandteaching-01.pdf
0624.speakingengagementsandteaching-01.pdf
 
办理阿卡迪亚大学毕业证(uvic毕业证)本科文凭证书原版一模一样
办理阿卡迪亚大学毕业证(uvic毕业证)本科文凭证书原版一模一样办理阿卡迪亚大学毕业证(uvic毕业证)本科文凭证书原版一模一样
办理阿卡迪亚大学毕业证(uvic毕业证)本科文凭证书原版一模一样
 
一比一原版(surrey毕业证书)英国萨里大学毕业证成绩单修改如何办理
一比一原版(surrey毕业证书)英国萨里大学毕业证成绩单修改如何办理一比一原版(surrey毕业证书)英国萨里大学毕业证成绩单修改如何办理
一比一原版(surrey毕业证书)英国萨里大学毕业证成绩单修改如何办理
 
Learnings from Successful Jobs Searchers
Learnings from Successful Jobs SearchersLearnings from Successful Jobs Searchers
Learnings from Successful Jobs Searchers
 
按照学校原版(ArtEZ文凭证书)ArtEZ艺术学院毕业证快速办理
按照学校原版(ArtEZ文凭证书)ArtEZ艺术学院毕业证快速办理按照学校原版(ArtEZ文凭证书)ArtEZ艺术学院毕业证快速办理
按照学校原版(ArtEZ文凭证书)ArtEZ艺术学院毕业证快速办理
 
一比一原版美国西北大学毕业证(NWU毕业证书)学历如何办理
一比一原版美国西北大学毕业证(NWU毕业证书)学历如何办理一比一原版美国西北大学毕业证(NWU毕业证书)学历如何办理
一比一原版美国西北大学毕业证(NWU毕业证书)学历如何办理
 

linux-memory-explained.pdf

  • 2. Copyright © 2017 Accenture All rights reserved. | 2 ENVIRONMENT Virtual machine: VirtualBox 5.1.28 r117968 Operating system: Oracle Linux 7.3 Kernel: 4.1.12-94.5.9.el7uek.x86_64 Oracle database: 12.2.0.1.170814 Amount of memory: 4GB
  • 3. Copyright © 2017 Accenture All rights reserved. | 3 BOOT An intel based system first starts it’s firmware, which is either BIOS or UEFI. The firmware tries to boot. When it boots from disk: • BIOS: scans disks for a MBR, which loads a boot loader: grub / grub2. • UEFI: scans disk for GUID partition table, read system partition, which executes boot loader: grub2. Grub(2) loads the linux kernel into memory: # dmesg | grep Command [ 0.000000] Command line: BOOT_IMAGE=/ vmlinuz-4.1.12-94.3.4.el7uek.x86_64 root=/dev/mapper/ol-root ro crashkernel=auto rd.lvm.lv=ol/root rd.lvm.lv=ol/swap rhgb quiet numa=off transparent_hugepage=never
  • 4. Copyright © 2017 Accenture All rights reserved. | 4 BOOT Of course this uses some memory: # dmesg | grep Memory [ 0.000000] Memory: 3743704K/4193848K available (7417K kernel code, 1565K rwdata, 3812K rodata, 1888K init, 3228K bss, 433760K reserved, 16384K cma-reserved) This means the kernel left approximately 3.6G of the 4.0G of PHYSICAL memory for use after loading. This can be verified by looking at /proc/meminfo: # head -1 /proc/meminfo MemTotal: 3781460 kB
  • 5. Copyright © 2017 Accenture All rights reserved. | 5 VIRTUAL MEMORY Linux is an operating system that provides virtual memory. • This means the addresses the processes see, are virtual memory addresses which are unique to that process. • This is called an ‘address space’. • All memory allocations are only visible to the process that allocated these. • The translation from virtual memory addresses to physical addresses is done by hardware in the CPU, using what is called the ‘memory management unit’ or MMU. • This is why specialistic memory management, like hugepages, is not only dependent on operating system support, but also on hardware support. • The management of physical memory is done by the operating system. • This is why swapping occurs by an operating system process.
  • 6. Copyright © 2017 Accenture All rights reserved. | 6 ADDRESS SPACE A linux 64 bit process address space consists of the full 2^64 addresses: # cat /proc/self/maps 00400000-0040b000 r-xp 00000000 f9:00 67398280 /usr/bin/cat 0060b000-0060c000 r--p 0000b000 f9:00 67398280 /usr/bin/cat 0060c000-0060d000 rw-p 0000c000 f9:00 67398280 /usr/bin/cat 023c4000-023e5000 rw-p 00000000 00:00 0 [heap] …lot's of other output… 7ffc4880e000-7ffc4882f000 rw-p 00000000 00:00 0 [stack] 7ffc48840000-7ffc48842000 r--p 00000000 00:00 0 [vvar] 7ffc48842000-7ffc48844000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] # echo "obase=16;2^64-1" | bc FFFFFFFFFFFFFFFF That’s 16 exa bytes! For every process!
  • 7. Copyright © 2017 Accenture All rights reserved. | 7 PHYSICAL MEMORY LIMITATIONS Please mind the 16 exabytes is virtual memory! Physical memory is limited by multiple factors. First factor is physical implementation of memory addressing on the CPU: # grep address size /proc/cpuinfo address sizes : 46 bits physical, 48 bits virtual address sizes : 46 bits physical, 48 bits virtual address sizes : 46 bits physical, 48 bits virtual address sizes : 46 bits physical, 48 bits virtual This is saying the linux kernel detects a CPU with 46 bits for addressing memory. # echo "2^46" | bc 70368744177664 That’s 64 terabyte. Per CPU socket.
  • 8. Copyright © 2017 Accenture All rights reserved. | 8 PHYSICAL MEMORY LIMITATIONS 64 terabyte is a lot. In a (typical) 2 socket setup, that is 128 terabyte of memory potentially addressable. The second limitation is the actual major limitation on physical memory. That is the on-die memory controller supported physical memory types and configuration, and the physical hardware motherboard setup.
  • 9. Copyright © 2017 Accenture All rights reserved. | 9 PHYSICAL MEMORY LIMITATIONS This is the DIMM and processor physical layout from the X6-2 service manual: It has got 12 DIMM slots per socket, and the manual describes it supports DIMMs up to 64G in size. This means that the maximal amount of physical memory is: 64*12*2=1536G = 1.5T!
  • 10. Copyright © 2017 Accenture All rights reserved. | 10 MEMORY: PHYSICAL AND VIRTUAL Okay. So far we got: Every linux process has its own address space of 16 exabytes of addressable virtual memory. A typical current Intel Xeon CPU has 46 bits physically available for memory addressing, which means every CPU can theoretically address 64T. A system is limited in physical memory by memory module size, number and populating rules dictated by the on die memory manager(s) (type of memory, number of channels and number of slots per channel)*, and the physical memory motherboard configuration. On an Oracle X6-2 system this is maximally 1.5T. Same for X7-2.
  • 11. Copyright © 2017 Accenture All rights reserved. | 11 EXECUTABLE MEMORY USAGE What happens when an executable is executed in a shell on linux? Common thinking is: • The process is cloned. • Loads the executable. • Executes it. • Returns the return code to the parent process. Memory needed = executable size. Which very roughly is what happens, but there’s a lot more to it than that! …and memory size that is needed is commonly NOT executable size!
  • 12. Copyright © 2017 Accenture All rights reserved. | 12 EXECUTABLE MEMORY USAGE A linux ELF executable is not entirely loaded into memory and then executed. For the ELF executable itself, some memory regions are allocated: # less /proc/self/maps 00400000-00421000 r-xp 00000000 f9:00 67316594 /usr/bin/less 00620000-00621000 r--p 00020000 f9:00 67316594 /usr/bin/less 00621000-00625000 rw-p 00021000 f9:00 67316594 /usr/bin/less 00625000-0062a000 rw-p 00000000 00:00 0 0255d000-0257e000 rw-p 00000000 00:00 0 [heap] 7f5149f59000-7f5150482000 r--p 00000000 f9:00 587351 /usr/lib/locale/locale-archive 7f5150482000-7f5150639000 r-xp 00000000 f9:00 134334316 /usr/lib64/libc-2.17.so 7f5150639000-7f5150838000 ---p 001b7000 f9:00 134334316 /usr/lib64/libc-2.17.so …etc…
  • 13. Copyright © 2017 Accenture All rights reserved. | 13 EXECUTABLE MEMORY USAGE 00400000-00421000 r-xp 00000000 f9:00 67316594 /usr/bin/less Code segment. Contains code to execute, is obviously readonly and executable. 00620000-00621000 r--p 00020000 f9:00 67316594 /usr/bin/less Rodata segment. Contains readonly variables, obviously is readonly too. 00621000-00625000 rw-p 00021000 f9:00 67316594 /usr/bin/less Data segment. Contains variables, obviously NOT readonly. 00625000-0062a000 rw-p 00000000 00:00 0 BSS segment. Contains statically allocated variables that are zero-valued initially. These are possible segments. There can be more and others. Code and data segments seem to be always used.
  • 14. Copyright © 2017 Accenture All rights reserved. | 14 EXECUTABLE MEMORY USAGE 00400000-00421000 r-xp 00000000 f9:00 67316594 /usr/bin/less 00620000-00621000 r--p 00020000 f9:00 67316594 /usr/bin/less 00621000-00625000 rw-p 00021000 f9:00 67316594 /usr/bin/less 00625000-0062a000 rw-p 00000000 00:00 0 421000 - 400000 = 21000 621000 - 620000 = 1000 625000 - 621000 = 4000 + 0x26000 = 155648 = 152K # du -sh $(which less) 156K /bin/less
  • 15. Copyright © 2017 Accenture All rights reserved. | 15 EXECUTABLE MEMORY USAGE A way to look at the total size of a process is looking at the VSZ/VSS or virtual set size: # ps -o pid,vsz,cmd -C less PID VSZ CMD 19477 110256 less /proc/self/maps (VSZ is in KB) However, this tells the virtual set size is 110MB! That is very much different from 152K of the ‘less’ executable?!
  • 16. Copyright © 2017 Accenture All rights reserved. | 16 EXECUTABLE MEMORY USAGE Let’s execute less /proc/self/maps again: 00400000-00421000 r-xp 00000000 f9:00 67316594 /usr/bin/less 00620000-00621000 r--p 00020000 f9:00 67316594 /usr/bin/less 00621000-00625000 rw-p 00021000 f9:00 67316594 /usr/bin/less 00625000-0062a000 rw-p 00000000 00:00 0 0255d000-0257e000 rw-p 00000000 00:00 0 [heap] 7f5149f59000-7f5150482000 r--p 00000000 f9:00 587351 /usr/lib/locale/locale-archive 7f5150482000-7f5150639000 r-xp 00000000 f9:00 134334316 /usr/lib64/libc-2.17.so 7f5150639000-7f5150838000 ---p 001b7000 f9:00 134334316 /usr/lib64/libc-2.17.so 7f5150838000-7f515083c000 r--p 001b6000 f9:00 134334316 /usr/lib64/libc-2.17.so 7f515083c000-7f515083e000 rw-p 001ba000 f9:00 134334316 /usr/lib64/libc-2.17.so 7f515083e000-7f5150843000 rw-p 00000000 00:00 0 7f5150843000-7f5150868000 r-xp 00000000 f9:00 134567523 /usr/lib64/libtinfo.so.5.9 7f5150868000-7f5150a67000 ---p 00025000 f9:00 134567523 /usr/lib64/libtinfo.so.5.9 7f5150a67000-7f5150a6b000 r--p 00024000 f9:00 134567523 /usr/lib64/libtinfo.so.5.9 7f5150a6b000-7f5150a6c000 rw-p 00028000 f9:00 134567523 /usr/lib64/libtinfo.so.5.9 …etc… There’s more stuff visible in the address space!
  • 17. Copyright © 2017 Accenture All rights reserved. | 17 EXECUTABLE MEMORY USAGE The ‘less’ executable is a dynamic linked executable. This can be seen by using the ‘ldd’ executable: shared library (loader) dependencies: # ldd $(which less) linux-vdso.so.1 => (0x00007ffecc5bd000) libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00007fe0db553000) libc.so.6 => /lib64/libc.so.6 (0x00007fe0db192000) /lib64/ld-linux-x86-64.so.2 (0x000055fde3a3b000) This means the less executable requires four libraries to run! • linux-vdso.so.1 Meta-library, injected by the kernel to solve slow sys calls. • libtinfo.so.5 Terminal info library. • libc.so.6 General c library. • ld-linux-x86-64.so.2 Program interpreter for loading libraries.
  • 18. Copyright © 2017 Accenture All rights reserved. | 18 EXECUTABLE MEMORY USAGE Okay that’s: /bin/less 156K /lib64/libtinfo.so.5.9 168K /lib64/libc-2.17.so 2.1 M /lib64/ld-2.17.so 156K + 2.580M versus VSZ: 110’256K !! Turns out there is the following entry in maps: 7f5149f59000-7f5150482000 r--p 00000000 f9:00 587351 /usr/lib/locale/locale-archive # du -sh /usr/lib/locale/locale-archive 102M /usr/lib/locale/locale-archive Ah! 2.5M (exec+libs) + 102M (locale-archive) + some other bits in maps is approx. 110’245K!
  • 19. Copyright © 2017 Accenture All rights reserved. | 19 VIRTUAL SET SIZE That is VSZ, virtual set size, explained: • The total size of all memory allocations in an address space. Now that you know about virtual set size: forget about it. It’s a useless statistic for determining memory usage, it says nothing… at all!
  • 20. Copyright © 2017 Accenture All rights reserved. | 20 VIRTUAL SET SIZE • Lazy loading and allocation. • Only the bare minimum of a file is read into the address space. • Page fault • Accessing a page of a memory allocation that hasn’t been read before in the address space causes the MMU to trigger an interrupt which calls the kernel ‘page fault’ routine. • ‘fault’ means the page is not yet available. • Causing this page to be read into the process’ address space. • Commonly referred to as ‘page in’. There is a statistic that shows the amount of pages actually available in the address space. This is what is called ‘resident set size’ or RSS.
  • 21. Copyright © 2017 Accenture All rights reserved. | 21 RESIDENT SET SIZE Let’s look at the less command with ‘ps’ including the RSS column: # ps -o pid,vsz,rss,cmd -C less PID VSZ RSS CMD 21908 110256 1940 less /proc/self/maps That is 1940K versus 110256K !!!!! Do you see how much work has been prevented by using lazy loading??? Actually, you can see how much is resident per memory allocation by looking at the ‘smaps’ file in proc.
  • 22. Copyright © 2017 Accenture All rights reserved. | 22 RESIDENT SET SIZE # less /proc/21908/smaps … 7fc57c21c000-7fc582745000 r--p 00000000 f9:00 587351 /usr/lib/locale/locale-archive Size: 103588 kB Rss: 436 kB Pss: 48 kB Shared_Clean: 436 kB Shared_Dirty: 0 kB … Only a tiny portion of /usr/lib/locale/locale-archive is actually used (Size versus Rss).
  • 23. Copyright © 2017 Accenture All rights reserved. | 23 RESIDENT SET SIZE So, now we got virtual set size (VSZ or sometimes VSS) that is the full size of files and memory allocations, which only very seldom gets truly allocated. It is not a useful statistic for determining memory usage. For being able to understand what is truly used in an address space we got resident set size (RSS). Does RSS show the actual memory used by a process? No. Linux has another trick up its sleeve. Any page that is in the address space of a process, is shared with all of its child processes. By using a technique called ‘copy on write’ (COW), changes to a page make that page unique for that process.
  • 24. Copyright © 2017 Accenture All rights reserved. | 24 PROPORTIONAL SET SIZE So, we apparently got this fantastic page sharing mechanism. But is there any way to see how and if this is being used? Yes. Linux keeps a statistic in ‘smaps’ that is called ‘Pss’: proportional set size. The PSS value is calculated in the following way: • The sum of the size of every resident page, • For which the size of every page is divided by the number of processes it is shared with.
  • 25. Copyright © 2017 Accenture All rights reserved. | 25 PROPORTIONAL SET SIZE Let’s look at the ‘smaps’ file for the less command again, at the ‘locale-archive’ allocation: # less /proc/21908/smaps … 7fc57c21c000-7fc582745000 r--p 00000000 f9:00 587351 /usr/lib/locale/locale-archive Size: 103588 kB Rss: 436 kB Pss: 48 kB Shared_Clean: 436 kB Shared_Dirty: 0 kB … The full size is 103588K, of which only 436K is resident in the process space, which is shared with other processes, because proportional only 48K is used!
  • 26. Copyright © 2017 Accenture All rights reserved. | 26 ORACLE MEMORY USAGE Of course linux applies these techniques to running Oracle too: # grep -e - -e ^Size -e ^Rss -e ^Pss /proc/$(pidof ora_vktm_test)/smaps 00400000-13ac8000 r-xp 00000000 f9:02 214249537 /u01/app/12.2.0.1/grid/bin/oracle Size: 318240 kB Rss: 43192 kB Pss: 1916 kB 13cc7000-13ead000 r--p 136c7000 f9:02 214249537 /u01/app/12.2.0.1/grid/bin/oracle Size: 1944 kB Rss: 884 kB Pss: 37 kB 13ead000-13ef3000 rw-p 138ad000 f9:02 214249537 /u01/app/12.2.0.1/grid/bin/oracle Size: 280 kB Rss: 24 kB Pss: 24 kB …etc…
  • 27. Copyright © 2017 Accenture All rights reserved. | 27 LINUX SHARED MEMORY How about shared memory? There is no ‘owner’ of shared memory, it is used by every process it is shared with. Every processes will page in whatever it actually needs: System V shared memory: 60c00000-96800000 rw-s 00000000 00:05 163844 /SYSV00000000 (deleted) Size: 880640 kB Rss: 884 kB Pss: 124 kB But does shared memory need to exist BEFORE processes can use it?
  • 28. Copyright © 2017 Accenture All rights reserved. | 28 LINUX SHARED MEMORY No: # ipcs -mu ------ Shared Memory Status -------- segments allocated 4 pages allocated 256005 pages resident 83055 pages swapped 0 Swap performance: 0 attempts 0 successes The same lazy allocation mechanism applies to shared memory too. (pages allocated versus pages resident)
  • 29. Copyright © 2017 Accenture All rights reserved. | 29 ORACLE SHARED MEMORY PAGING Actually, there are a few settings that influences the way Oracle uses shared memory. _touch_sga_pages_during_allocation During startup the bequeathing process touches SGA pages, which allocates them. Default value: false. (Setting it to true doesn’t seem to do anything on my test systems) pre_page_sga In the nomount phase a process (sa00) is spawned by the database that touches SGA pages, and vanishes after it’s done. Default value: false*. This means that by default Oracle allocates SGA in a lazy manner too!
  • 30. Copyright © 2017 Accenture All rights reserved. | 30 HUGE / LARGE PAGES Hugepages is a joint operating system and hardware (MMU) feature to increase the memory administration unit (‘page’) size. Regular page size is 4K. 2MB page size support is dependent on cpu flag ‘pse’: # grep pse /proc/cpuinfo flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 1GB page size support is dependent on cpu flag ‘pdpe1gb’: # grep pdpe1gb /proc/cpuinfo flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
  • 31. Copyright © 2017 Accenture All rights reserved. | 31 HUGE / LARGE PAGES Why use hugepages? • Page addresses are cached in a memory area that is called ‘transaction lookaside buffer’ or TLB. • Having fewer entries improves TLB cache hit ratio, and reduces CPU usage. • Hugepages are non-swappable. • Hugepages reduces per process page address administration size: Pagetable size. Hugepages memory creation requires physical memory that has not been used by 4K pages (‘fragmented’) previously. Memory that is configured for hugepages, is not usable for regular 4K allocations ! ! ! ! Even if none of the hugepages are used by any process.
  • 32. Copyright © 2017 Accenture All rights reserved. | 32 HUGE / LARGE PAGES # grep ^HugePage /proc/meminfo HugePages_Total: 1126 HugePages_Free: 784 HugePages_Rsvd: 159 HugePages_Surp: 0 Used 1126 (tot) - 784 (free) = 342 Used Reserved = 159 Reserved Real free 784 (free) - 159 (rsvd) = 625 Free + 1126 Total
  • 33. Copyright © 2017 Accenture All rights reserved. | 33 HUGE / LARGE PAGES The reporting for hugepages memory usage by ipcs is ‘kind of’ broken: # ipcs -mu ------ Shared Memory Status -------- segments allocated 4 pages allocated 256517 pages resident 175105 256517 * 4K = 1002M. In other words: ipcs -mu reports the hugepages as regular (4K) pages!
  • 34. Copyright © 2017 Accenture All rights reserved. | 34 HUGE / LARGE PAGES The reporting for hugepages memory allocations in /proc/PID/smaps is ‘kind of’ broken too: # less /proc/$(pidof ora_pmon_test)/smaps 60400000-96400000 rw-s 00000000 00:0e 360449 /SYSV00000000 (deleted) Size: 884736 kB Rss: 0 kB Pss: 0 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 0 kB Anonymous: 0 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 2048 kB MMUPageSize: 2048 kB This is the variable SGA shared memory alloc. Rss says we allocated nothing. Not true.
  • 35. Copyright © 2017 Accenture All rights reserved. | 35 ANONYMOUS ALLOCATIONS Outside of the code, data and rodata allocations mapped from a file into memory, there’s memory allocations NOT backed by a file. 00621000-00625000 rw-p 00021000 f9:00 67316594 /usr/bin/less 00625000-0062a000 rw-p 00000000 00:00 0 0255d000-0257e000 rw-p 00000000 00:00 0 [heap] 7f5149f59000-7f5150482000 r--p 00000000 f9:00 587351 /usr/lib/locale/locale-archive 7f5150482000-7f5150639000 r-xp 00000000 f9:00 134334316 /usr/lib64/libc-2.17.so 7f5150639000-7f5150838000 ---p 001b7000 f9:00 134334316 /usr/lib64/libc-2.17.so 7f5150838000-7f515083c000 r--p 001b6000 f9:00 134334316 /usr/lib64/libc-2.17.so 7f515083c000-7f515083e000 rw-p 001ba000 f9:00 134334316 /usr/lib64/libc-2.17.so 7f515083e000-7f5150843000 rw-p 00000000 00:00 0 7f5150843000-7f5150868000 r-xp 00000000 f9:00 134567523 /usr/lib64/libtinfo.so.5.9 The allocations above are BSS segments. Memory explicitly allocated using malloc() shows up identical, also being anonymous allocations. The PGA of Oracle processes is allocated in anonymous allocations.
  • 36. Copyright © 2017 Accenture All rights reserved. | 36 MEMORY USAGE CALCULATION Is there a simple way to calculate actual, total, memory usage for Oracle on Linux? Answer: no, not really. VSZ: has nothing to do with actual usage. RSS: describes memory allocation for a process, the pages are allocated, but pages can be shared, so RSS will hugely oversubscribe. Unless RSS = PSS. PSS: describes proportional memory usage, only if all processes PSS sizes are calculated together, at the same time, it will tell actual allocation (this is VERY fluent information!). For process private memory, PSS is the way to go. But for shared memory, you can’t use a process centric method. Memory could be allocated by a process that quit.
  • 37. Copyright © 2017 Accenture All rights reserved. | 37 MEMORY USAGE CALCULATION The only way to realistically say something about linux memory usage, is using these statistics in /proc/ meminfo: • MemFree: idle memory. • KernelStack, Slab: kernel memory. • Buffers: linux kernel buffers for IO. • SwapCached: cache for swapped pages. • Cached (- Shmem): linux page cache and memory mapped files. • PageTables: virtual to physical memory addresses administration. • Shmem: small pages shared memory allocation. • AnonPages: all anonymous memory allocations. • Hugepages_Total - Hugepages_Free: Used hugepages (number in pages) • Hugepages_Rsvd: Logically allocated, not yet physically allocated hugepages (number in pages) • Hugepages_Free-Hugepages_Rsvd: true free hugepages (number in pages)
  • 38. Copyright © 2017 Accenture All rights reserved. | 38 DEMO: LIVE MEMORY USAGE If you want to do the same yourself: 1. Install node_exporter (https://github.com/prometheus/node_exporter) • This is the fetcher (‘exporter’) of the statistics. 2. Install prometheus (https://github.com/prometheus/prometheus) • This is the persistence layer for the statistics. 3. Install grafana (http://grafana.com) • This is the presentation/visualisation layer. 4. Setup prometheus to fetch data from node_exporter (https://resin.io/blog/monitoring- linux-stats-with-prometheus-io/) • node_exporter → prometheus. 5. Add prometheus as a grafana source (https://prometheus.io/docs/visualization/grafana/) • prometheus → grafana 6. Import grafana dashboard id: 2747 (https://fritshoogland.wordpress.com/2017/07/31/installation-overview-of-node_exporter- prometheus-and-grafana/)
  • 39. Copyright © 2017 Accenture All rights reserved. | 39
  • 40. Copyright © 2017 Accenture All rights reserved. | 40 Now let’s startup the Oracle database. Important settings: • sga_target = 2G • pga_aggregate_target = 500M* • filesystemio_options = setall DEMO: LIVE MEMORY USAGE
  • 41. Copyright © 2017 Accenture All rights reserved. | 41
  • 42. Copyright © 2017 Accenture All rights reserved. | 42
  • 43. Copyright © 2017 Accenture All rights reserved. | 43
  • 44. Copyright © 2017 Accenture All rights reserved. | 44
  • 45. Copyright © 2017 Accenture All rights reserved. | 45
  • 46. Copyright © 2017 Accenture All rights reserved. | 46
  • 47. Copyright © 2017 Accenture All rights reserved. | 47 I didn’t mention huge pages (yet)! Let’s see how that looks like: DEMO: LIVE MEMORY USAGE
  • 48. Copyright © 2017 Accenture All rights reserved. | 48
  • 49. Copyright © 2017 Accenture All rights reserved. | 49
  • 50. Copyright © 2017 Accenture All rights reserved. | 50
  • 51. Copyright © 2017 Accenture All rights reserved. | 51
  • 52. Copyright © 2017 Accenture All rights reserved. | 52
  • 53. Copyright © 2017 Accenture All rights reserved. | 53
  • 54. Copyright © 2017 Accenture All rights reserved. | 54
  • 55. Copyright © 2017 Accenture All rights reserved. | 55 A computer system can use a limited amount of memory because of memory manager limitations and physical slots. The linux kernel needs some memory to run, which means MemTotal in /proc/meminfo will not give you the total physical amount of memory. An executable can be ‘dynamically linked’, which means it uses libraries for additional functionality. The oracle database executable is an example of that. CONCLUSION
  • 56. Copyright © 2017 Accenture All rights reserved. | 56 The full size of an allocation is called virtual set size, which is not a useful statistic for determining memory usage. The amount of pages for any allocation in an address space that is truly directly available is called the resident set size. Any process shares its memory allocations with its child processes. Only once a page is modified, it is made unique to a process. The page sharing can be seen by looking at the resident set size and the proportional set size in /proc/ PID/smaps. CONCLUSION
  • 57. Copyright © 2017 Accenture All rights reserved. | 57 Huge pages memory can only be used for huge pages allocations. Even if there is a shortage of small pages. Shortage of (smallpages) memory results in pages being moved to the swap device. Cached/Mapped and Anonymous pages are the first to be swapped. Shared memory pages will be swapped when memory pressure persists. Other/kernel related pages can be swapped too, but do not seem to get swapped too much. PageTables: if these take a significant amount, consider hugepages. CONCLUSION
  • 58. Copyright © 2017 Accenture All rights reserved. | 58 The lazy loading can make memory shortage/swapping diagnosis hard. Any memory allocation is done lazy. The Oracle SGA is not fully physically allocated at startup (depending on settings). This means when sized too big (especially in a consolidated environment) it might take some time before memory is truly allocated and memory pressure becomes visible. The Oracle PGA is not allocated at startup, but allocated per process on demand, and can free memory too. PGA_AGGREGATE_TARGET is a target value for instance work area’s. PGA memory always is in Anonymous memory allocations on linux. PGA can easily allocate huge amounts of memory, for example with ‘collections’. Having lots of sessions and lots of cursors can lead to a lot of Anonymous memory locked and wasted. CONCLUSION
  • 59. Copyright © 2017 Accenture All rights reserved. | 59 Because any memory operation is done lazy, moving swapped pages back into memory is also done lazy. This means a past memory shortage can lead to swap being in use currently, with no actual memory problem. There only is a memory pressure/swapping problem if there is a persistent swapping out and swapping in happening, or single way at a high rate, leading to processes getting stuck. Not because swap is filled for X% CONCLUSION
  • 60. Copyright © 2017 Accenture All rights reserved. | 60 Q & A