Perl Memory Use - LPW2013

Perl Memory Use
Tim Bunce @ London Perl Workshop 2013

Ouch!
$ perl some_script.pl
Out of memory!
$
Killed.
$
$
Someone shouts: "Hey! My process has been killed!"
[...later...] "Why is it taking so long?"

$ perl -e 'system("cat /proc/$$/stat")'
# $$ = pid
4752 (perl) S 4686 4752 4686 34816 4752 4202496 536 0 0 0 0 0 0 0 20 0 1 0 62673440 123121664
440 18446744073709551615 4194304 4198212 140735314078128 140735314077056 140645336670206 0 0
134 0 18446744071579305831 0 0 17 10 0 0 0 0 0 0 0 0 0 0 4752 111 111 111
$ perl -e 'system("cat /proc/$$/statm")'
30059 441 346 1 0 160 0
$ perl -e 'system("ps -p $$ -o vsz,rsz,sz,size")'
VSZ
RSZ
SZ
SZ
120236 1764 30059
640
$ perl -e 'system("top -b -n1 -p $$")'
...
PID USER
PR NI VIRT RES SHR S %CPU %MEM
13063 tim
20
0 117m 1764 1384 S 0.0 0.1

TIME+ COMMAND
0:00.00 perl

$ perl -e 'system("cat /proc/$$/status")'
...
VmPeak:!
120236 kB
VmSize:!
120236 kB <- total (code, libs, stack, heap etc.)
VmHWM:!
1760 kB
VmRSS:!
1760 kB <- how much of the total is resident in physical memory
VmData:!
548 kB <- data (heap)
VmStk:!
92 kB <- stack
VmExe:!
4 kB <- code
VmLib:!
4220 kB <- libs, including libperl.so
VmPTE:!
84 kB
VmPTD:!
28 kB
VmSwap:!
0 kB
...
Further info on unix.stackexchange.com

C Program Code

int main(...) { ... }

Read-only Data

eg “String constants”

Read-write Data

un/initialized variables

Heap

(not to scale!)

Shared Lib Code

Shared Lib R/O Data

repeated for each lib

Shared Lib R/W Data

//

C Stack
System

(not the perl stack)

$ perl -e 'system("cat /proc/$$/maps")'
address
perms ... pathname
00400000-00401000
r-xp ...
/.../perl-5.NN.N/bin/perl
00601000-00602000
rw-p ...
/.../perl-5.NN.N/bin/perl
0087f000-008c1000

rw-p ...

[heap]

7f858cba1000-7f8592a32000 r--p ...

/usr/lib/locale/locale-archive-rpm

7f8592c94000-7f8592e1a000
7f8592e1a000-7f859301a000
7f859301a000-7f859301e000
7f859301e000-7f859301f000
7f859301f000-7f8593024000

r-xp
---p
r--p
rw-p
rw-p

...
...
...
...
...

/lib64/libc-2.12.so
/lib64/libc-2.12.so
/lib64/libc-2.12.so
/lib64/libc-2.12.so

r-xp
---p
rw-p
rw-p

...
...
...
...

/.../lib/5.NN.N/x86_64-linux/CORE/libperl.so

...other libs...
7f8593d1b000-7f8593e7c000
7f8593e7c000-7f859407c000
7f859407c000-7f8594085000
7f85942a6000-7f85942a7000

7fff61284000-7fff6129a000 rw-p ...

[stack]

7fff613fe000-7fff61400000 r-xp ...
[vdso]
ffffffffff600000-ffffffffff601000 r-xp ... [vsyscall]

$ perl -e 'system("cat /proc/$$/smaps")' # note ‘smaps’ not ‘maps’
address
...

perms ...

pathname

7fb00fbc1000-7fb00fd22000 r-xp ... /.../5.10.1/x86_64-linux/CORE/libperl.so
Size:
1412 kB
<- size of executable code in libperl.so
Rss:
720 kB
<- amount that's currently in physical memory
Pss:
364 kB
Shared_Clean:
712 kB
Shared_Dirty:
0 kB
Private_Clean:
8 kB
Private_Dirty:
0 kB
Referenced:
720 kB
Anonymous:
0 kB
AnonHugePages:
0 kB
Swap:
0 kB
KernelPageSize:
4 kB
MMUPageSize:
4 kB
... repeated for every segment ...
... repeated for every segment ...

Memory Pages
✦

Process view:
✦

✦

Large continuous regions of memory. Simple.

Operating System view:
✦

Memory is divided into pages

✦

Pages are loaded to physical memory on demand

✦

Mapping can change without the process knowing

C Program Code
Read-only Data
Read-write Data

Memory is divided into pages
Page size is typically 4KB

Heap
← Page ‘resident’ in physical
memory
← Page not resident

Shared Lib Code
Shared Lib R/O Data
Shared Lib R/W Data

C Stack
System

RSS “Resident Set Size”
is how much process memory is
currently in physical memory

Key Point
✦

Don’t use Resident Set Size (RSS)
✦

✦

✦

Unless you really want to know what’s currently resident.
It can shrink even while the process size grows.

Heap size or Total memory size is a good indicator.

Heap

← Your perl stuff goes here

malloc manages memory allocation

Heap
perl data

malloc() requests big
chunks of memory from the
operating system as needed.
Almost never returns it!
Perl makes lots of malloc
and free requests.
Freed fragments of various
sizes accumulate.

Perl Data Anatomy
Integer
(IV)
String
(PV)
Number
with a
string

Head Body Data

Illustrations from illguts

Glob (GV)

Symbol Table (Stash)

Sub (CV)

lots of tiny chunks!

Devel::Peek
•

Gives you a textual view of data
$ perl -MDevel::Peek -e '%a = (42 => "Hello World!"); Dump(%a)'
SV = IV(0x1332fd0) at 0x1332fe0
REFCNT = 1
FLAGS = (TEMP,ROK)
RV = 0x1346730
SV = PVHV(0x1339090) at 0x1346730
REFCNT = 2
FLAGS = (SHAREKEYS)
ARRAY = 0x1378750 (0:7, 1:1)
KEYS = 1
FILL = 1
MAX = 7
Elt "42" HASH = 0x73caace8
SV = PV(0x1331090) at 0x1332de8
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x133f960 "Hello World!"0
CUR = 12
<= length in use
LEN = 16
<= amount allocated

Devel::Size
•

Gives you a measure of the size of a data structure
$ perl -MDevel::Size=total_size -le 'print total_size( 0 )'
24
$ perl -MDevel::Size=total_size -le 'print total_size( [] )'
64
$ perl -MDevel::Size=total_size -le 'print total_size( {} )'
120
$ perl -MDevel::Size=total_size -le 'print total_size( [ 1..100 ] )'
3264

•
•
•

Created by Dan Sugalski, now maintained by Nicholas Clark
Is very fast, and accurate for most simple data types.
Has limitations and bugs, but is the best tool we have.

Arenas
Heads and Bodies are allocated from ‘arenas’ (slabs) managed by perl.
One for SV heads an one for each size of SV body.
More efﬁcient than malloc in space and speed.
Introspect arenas with Devel::Arena and Devel::Gladiator.
$ perl -MDevel::Gladiator=arena_table -e 'warn arena_table()'
ARENA COUNTS:
1063 SCALAR
199 GLOB
120 ARRAY
95 CODE
66 HASH
...

Key Notes
✦

All variable length data storage comes from malloc
✦

✦

Heads and Bodies are allocated from ‘arenas’ managed by perl
✦

✦

malloc has overheads, bucket and fragmentation issues

Arenas have less overhead but are never freed

Memory usage will always be higher than the sum of the sizes.

Memory Proﬁling?
✦

Track memory size over time?
✦

✦

Experiments with Devel::NYTProf

✦

✦

See where memory is allocated and freed?

Turned out to not be very useful

Need to know what is ‘holding’ memory.

Space in Hiding
✦

Perl tends to consume extra memory to save time

✦

This can lead to surprises, for example:
✦

✦

sub foo {
my $var = "X" x 10_000_000;
}
foo();
# ~20MB still used after return!
sub bar{
my $var = "X" x 10_000_000;
bar($_[0]-1) if $_[0]; # recurse
}
bar(50);
# ~1GB still used after return!

X-Ray Vision!
✦

Want to see inside the black box

✦

Want to know “where memory is being held”

✦

A snapshot “crawl and dump” approach

✦

Separate capture from analysis

My Plan
✦
✦
✦
✦
✦
✦
✦
✦

(circa 2012)

Extend Devel::Size
Add a C-level callback hook
Add some kind of "data path name" mechanism
Add a function to return the size of everything
Stream the data to disk
Write tools to manipilate, summarize & query the data
Write tools to visualize the data
Write tools to compare sets of data

Devel::SizeMe
✦
✦
✦
✦
✦
✦

Fork of Devel::Size
Still very experimental
Lots of hacks and rough edges
Some deep refactoring needed
Still exploring what’s possible
... but it seems useful now

Devel::SizeMe Outputs
✦
✦
✦
✦
✦

✦
✦

Text - handy for testing and simple structures
Graphviz - useful visualization for up to ~1000 nodes
Treemap - useful for simple top-down view (“blame”)
Gephi - full network view (structure, relationships)
SQLite db
Very little analysis implemented yet
Ref-loops are isolated from “owners”

Devel::SizeMe

sizeme_store

SQLite db

sizeme_graph

Text
Text
Graphviz (dot)
GEXF
???

Gephi

Treemap in browser

See https://archive.org/details/Perl-Memory-Proﬁling-LPW2013

Devel::SizeMe Summary
✦

Focussed on memory use

✦

Walks trees of pointers in perl internals

✦

Can dump individual data structures

✦

Stream-based - scales to any size of application

✦

Multiple output formats

✦

Very minimal and informal data model

Current Limitations
✦

Very minimal and informal data model

✦

Ref loops gets separated out

✦

Accumulating sizes up tree happens too soon

✦

Can’t edit the tree without invalidating sizes

✦

Needs a multi-phase processing pipeline

✦

Needs a more task-oriented user interface

Recommendations
✦

Store the data in some kind of database

✦

Perform transformations on the database data

✦

Generate UI from the database - scalability

✦

Express queries as db queries - ﬂexibility

✦

What kind of database? Relational or Graph?

Possible Futures
✦

Feed Devel::MAT data into SQLite

✦

Feed SQLite data into Neo4j

✦

Develop useful Cypher query fragments

✦

Develop graph simpliﬁcations as plugins

✦

Develop visualizations

Questions?
Tim.Bunce@pobox.com
http://blog.timbunce.org
@timbunce

Perl Memory Use - LPW2013

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Perl Memory Use - LPW2013

Similar to Perl Memory Use - LPW2013 (20)

More from Tim Bunce

More from Tim Bunce (10)

Recently uploaded

Recently uploaded (20)

Perl Memory Use - LPW2013