Debugging Tools & Techniques for Persistent Memory Programming

Eduardo Berrocal García de Carellán
eduardo.berrocal@Intel.com
04/17/2019

SPDK, PMDK & Vtune™ Summit
Agenda
• Introduction
• Hardware Issues
• Error Injection
• Software Bugs
• Conclusions and Q&A
2

SPDK, PMDK & Vtune™ Summit 3
Persistent memory programming
introduces new opportunities…
• Byte addressable
• Cache coherent
• Load/Store access
• no page caching
• Memory-like performance
introduction DRAM
NVDIMM
Intel®Optane™ssd
Pciessd
Pciessd
SATASSD
SATASSD
HDD
HDD
tape

…but also new challenges
• Programming difficulty
• New classes of bugs
• New vectors to consider for
performance
Introduction(CONT’D) DRAM
NVDIMM
Intel®Optane™ssd
Pciessd
Pciessd
SATASSD
SATASSD
HDD
HDD
tape

• Module failure
• HDDs and SSDs use RAID
• Memory controllers do not implement RAID
• Data integrity
• Block corruption (i.e., bad blocks)
HARDWAREISSUES
IMC
Cascade Lake
IMC

Dataintegrity
• What if we discover that our pool is corrupted due to bad blocks?
• Opening a corrupted pool
• Pool gets corrupted while program is running
$ ./myProgram /mnt/pmem/poolfile
/mnt/pmem/poolfile: Input/output error
Bus error (core dumped)

Dataintegrity
• What if we discover that our pool is corrupted due to bad blocks?
• Pmempool check will also fail
$ cp /mnt/pmem/poolfile ./
cp: error reading '/mnt/pmem/poolfile': Input/output error
# dd if=/dev/pmem0 of=/dev/null
dd: error reading /dev/pmem0: Input/output error
20480+0 records in
20480+0 records out
10485760 bytes (10 MB) copied, 0.0912348 s, 115 MB/s
$ pmempool check –v /mnt/pmem/poolfile
Bus error (core dumped)

Dataintegrity
• Cleaning the poison by writing to the affected blocks
# cat /sys/block/pmem0/badblocks
20480 1
# dd conv=notrunc if=/dev/zero of=/dev/pmem0 oflag=direct bs=512 seek=20480 count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.000114311 s, 9.0 MB/s
# cat /sys/block/pmem0/badblocks
#

Tool-Pmempool
• Standalone utility for management and off-line analysis of persistent memory pools
• It works for both the single-file pools and for pool set files.
• Commands: create, info, dump, check, rm, convert, sync and transform
• usage: pmempool [--version] [--help] <command> [<args>]

Pmempool(check)
data
header
data
header
Applications are responsible for correcting these errors
We can use pmempool to check (and sometimes correct)
corruptions in the headers
$ pmempool check -v /mnt/pmem/poolfile
checking pool header
incorrect pool header
/mnt/mem/poolfile: not consistent

Pmempool(check)
$ pmempool check –v –r –N –a /mnt/pmem/poolfile
checking pool header
incorrect pool header
pool_hdr.signature is not valid. Do you want to set it to PMEMBLK? [Y/n] Y
pool_hdr.major is not valid. Do you want to set it to default value 0x1? [Y/n] Y
setting pool_hdr.signature to PMEMBLK
setting pool_hdr.major to 0x1
invalid pool_hdr.poolset_uuid. Do you want to set it to 2a3b402a-2be0-46f0-a86d-7afef54b258a from BTT Info? [Y/n] Y
setting pool_hdr.poolset_uuid to 2a3b402a-2be0-46f0-a86d-7afef54b258a
invalid pool_hdr.checksum. Do you want to regenerate checksum? [Y/n] Y
setting pool_hdr.checksum to 0xb199cec3475bbf3a
checking pmemblk header
pmemblk header correct
checking BTT Info headers
arena 0: BTT Info header checksum correct
checking BTT Map and Flog
arena 0: checking BTT Map and Flog
/mnt/pmem/poolfile: repaired
• Attempting repairs (-r [-N –a])

SPDK, PMDK & Vtune™ Summit
• In PMDK (tools and libraries), pool sets are equivalent to regular pools.
• Pool extensions
• Pool replication
• Local
• Remote
12
PMDKFEATURE-Poolsets
$ cat my_extended_pool.set
PMEMPOOLSET
100G /mountpoint0/myfile.part0
$ cat my_local_replica.set
PMEMPOOLSET
100G /mountpoint0/myfile
REPLICA
100G /mountpoint1/myreplica
$ cat my_remote_replica.set
PMEMPOOLSET
100G /mountpoint0/myfile
REPLICA user@example.com myremotepool.set
data0
header
data1
header
data2
header
data
header
data
header
+ +
=
data
header
data
header
=

Pmempool(sync)
$ pmempool check –v mypool.set
replica 0 part 0: checking pool header
replica 0 part 0: incorrect pool header
poolfile.set: not consistent
$ pmempool sync -v mypool.set
mypool.set: synchronized
$ pmempool check –v mypool.set
replica 0: checking shutdown state
replica 0: shutdown state correct
replica 1: checking shutdown state
replica 1: shutdown state correct
replica 0 part 0: pool header correct
replica 1 part 0: pool header correct
mypool.set: consistent
• Repairing a corrupted replica

ERRORINJECTION
# filefrag -v -b512 /mnt/pmem/poolfile | grep -E "^[ ]+[0-9]+.*" | head -1 | awk '{ print $4 }' | cut -d. -f1
278528
# echo 278528 1 > /sys/block/pmem0/badblocks
• Using sysfs
• What about ndctl?
# ndctl inject-error --block=1 --count=2 namespace0.0

Softwarebugs
• Persistent memory leaks
• Non-persistent stores
• Stores not added into a transaction
• Memory added to two different transactions
• Memory overwrites
• Unnecessary flushes
• Out-of-order stores

Persistentmemoryleaks
• Volatile programs treat leaks mainly as a
performance problem
• In persistent programs, we need to also think
about data corruption/loss
• The good: we can recover leaks
• The bad: Garbage collection is not supported
natively in PMKD yet
• The ugly: We need to use macros to access this
API
header
root
obj0 objn-1 objn
objk

persistent_ptr<my_type> my_data_structure = proot->ds;
PMEMoid raw_root = my_data_structure.raw ();
PMEMoid raw_obj;
POBJ_FOREACH (pop.get_handle (), raw_obj)
{
if (pmemobj_type_num (raw_obj)
== pmemobj_type_num (raw_root)) {
if (my_data_structure.is_missing (raw_obj) == true) {
my_data_structure.add_missing (raw_obj);
}
}
}
• All allocated objects in libpmemobj are always added to an internal list

• API (libpmemobj) for this internal list:
• POBJ_FIRST (pop, t)
• POBJ_NEXT (o)
• POBJ_FOREACH (pop, varoid)
• POBJ_FOREACH_SAFE (pop, varoid, nvaroid)
• POBJ_FOREACH_TYPE (pop, var)
• POBJ_FOREACH_SAFE_TYPE (pop, var, nvar)

Tool-Pmemcheck
• New Valgrind* tool developed by Intel®
• You also need an enhanced version of Valgrind* supporting CLFLUSHOPT and CLWB
• Go to https://github.com/pmem/valgrind
• usage: valgrind --tool=pmemcheck [<options>] <program> [<args>]
• More info: valgrind --tool=pmemcheck --help

Pmemcheck(withoutpmdk)
• Pmemcheck does not have a way to know which memory addresses are persistent and
which ones are volatile
• Pmemcheck does not know where a transaction starts and ends
data = (int *)mmap (NULL, size, PROT_READ|PROT_WRITE,
MAP_SHARED, fd, 0);
VALGRIND_PMC_REGISTER_PMEM_MAPPING (data, size);
...
munmap (data, size);
VALGRIND_PMC_REMOVE_PMEM_MAPPING (data, size);
VALGRIND_PMC_START_X;
...
VALGRIND_PMC_END_TX;

Tool-Intel®inspector–persistentinspector
• Specifically tailored for persistent memory
• Included as part of Intel® Inspector 2019
• Intel® Inspector is available with
Intel® Parallel Studio XE and Intel® System
Studio
• Analysis
1. pmeminsp cb –pmem-file <pmem_file_path> -- <writer_program> [<params>]
2. pmeminsp ca –pmem-file <pmem_file_path> -- <reader_program> [<params>]
3. pmeminsp rp -- <writer_program> <reader_program>

• Data written to persistent memory
but not flushed correctly
• Data not flushed may still sit on the
CPU caches and could be lost if
process crashes
Non-persistentstores
writer () {
var1 = "Hello world to PMEM!";
flush (var1);
var1_valid = True;
flush (var1_valid);
}
reader () {
if (var1_valid == True) {
print (var1);
}
}

but not flushed correctly
• Data not flushed may still sit on the
CPU caches and could be lost if
process crashes
writer () {
var1 = "Hello world to PMEM!";
flush (var1);
var1_valid = True;
flush (var1_valid);
}
reader () {
if (var1_valid == True) {
print (var1);
}
}
var1_valid
var1
write
dependency

$ valgrind --tool=pmemcheck ./test1w_
==28699== pmemcheck-1.0, a simple persistent store checker
==28699== Copyright (c) 2014-2016, Intel Corporation
==28699== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==28699== Command: ./test1w_
==28699==
==28699==
==28699== Number of stores not made persistent: 2
==28699== Stores not made persistent properly:
==28699== [0] at 0x400931: main (test1w_.c:16)
==28699== Address: 0x4023000 size: 4state: DIRTY
==28699== [1] at 0x400927: main (test1w_.c:15)
==28699== Address: 0x4023004 size: 4state: DIRTY
==28699== Total memory not made persistent: 8
==28699== ERROR SUMMARY: 2 errors
• Pmemcheck

$ pmeminsp rp -- listing_8-16 listing_8-17
#===============================================================================
# Diagnostic # 1: Missing cache flush
#-------------------
The first memory store
of size 4 at address 0x7F9C68893004 (offset 0x4 in /mnt/pmem/file)
in /data/listing_8-16!main at listing_8-16.c:13 - 0x67D
in /lib64/libc.so.6!__libc_start_main at <unknown_file>:<unknown_line> - 0x223D3
in /data/listing_8-16!_start at <unknown_file>:<unknown_line> - 0x534
is not flushed before
the second memory store
in /data/listing_8-16!main at listing_8-16.c:14 - 0x687
while
memory load from the location of the first store
in /data/listing_8-17!main at listing_8-17.c:13 - 0x6C8
depends on
memory load from the location of the second store
in /data/listing_8-17!main at listing_8-17.c:12 - 0x6BD
• Persistent Inspector
#===============================================================================
#-------------------
Memory store
memory is unmapped
Analysis complete. 2 diagnostic(s) reported.

$ pmeminsp rp -- listing_8-16 listing_8-17
#===============================================================================
#-------------------
The first memory store
in /data/listing_8-16!main at listing_8-16.c:13 - 0x67D
the second memory store
while
memory load from the location of the first store
in /data/listing_8-17!main at listing_8-17.c:13 - 0x6C8
depends on
memory load from the location of the second store
in /data/listing_8-17!main at listing_8-17.c:12 - 0x6BD
#===============================================================================
#-------------------
Memory store
memory is unmapped

StoresnotaddedIntoatransaction
TX_BEGIN (pop) {
TOID (struct my_root) root = POBJ_ROOT (pop, struct my_root);
TX_ADD_FIELD (root, value);
D_RW (root)->value = 4;
D_RW (root)->is_odd = D_RO (root)->value % 2;
} TX_END
• It is assumed that all the modified PMEM locations have been added to it at
the beginning
• This allows the transaction to flush these locations at the end or roll back to
the old values in the event of an unexpected failure

StoresnotaddedIntoatransaction
$ valgrind --tool=pmemcheck ./listing_8-25
==48660== Command: ./listing_8-25
==48660==
==48660==
==48660== Stores not made persistent properly:
==48660== [0] at 0x400C2D: main (listing_8-25.c:18)
==48660== Address: 0x7dc0554 size: 4 state: DIRTY
==48660== Total memory not made persistent: 4
==48660==
==48660== Number of stores made without adding to transaction: 1
==48660== Stores made without adding to transactions:
==48660== [0] at 0x400C2D: main (listing_8-25.c:18)
==48660== Address: 0x7dc0554 size: 4
$ pmeminsp cb -pmem-file /mnt/pmem/pool -- ./listing_8-25
++ Analysis starts
++ Analysis completes
++ Data is stored in folder "/data/.pmeminspdata/data/listing_8-25"
$
$ pmeminsp rp -- ./listing_8-25
#===============================================================================
# Diagnostic # 1: Store without undo log
#-------------------
Memory store
of size 4 at address 0x7FAA84DC0554 (offset 0x3C0554 in /mnt/pmem/pool)
in /data/listing_8-25!main at listing_8-25.c:18 - 0xC2D
is not undo logged in
transaction
in /data/listing_8-25!main at listing_8-25.c:14 - 0xB67
• pmemcheck • Persistent Inspector

StoresAddedtotwodifferenttransactions
• Adding the same object to multiple
transactions can corrupt data
• In PMDK, the library maintains a
transaction per thread

Storesaddedtotwodifferenttransactions
$ valgrind --tool=pmemcheck ./test8b
==42444== Command: ./test8b
==42444==
==42444==
==42444==
==42444== Number of overlapping regions registered in different transactions: 1
==42444== Overlapping regions:
==42444== [0] at 0x4E6ADCC: pmemobj_tx_add_snapshot (tx.c:1080)
==42444== by 0x4E6B2FC: pmemobj_tx_add_common.constprop.18 (tx.c:1168)
==42444== by 0x4E6C38B: pmemobj_tx_add_range (tx.c:1352)
==42444== by 0x400C48: func (test8b.c:15)
==42444== by 0x4C2DDD4: start_thread (in /usr/lib64/libpthread-2.17.so)
==42444== by 0x517EEAC: clone (in /usr/lib64/libc-2.17.so)
==42444== Address: 0x7dc0550 size: 8 tx_id: 2
==42444== First registered here:
==42444== [0]' at 0x4E6ADCC: pmemobj_tx_add_snapshot (tx.c:1080)
==42444== by 0x4E6B2FC: pmemobj_tx_add_common.constprop.18 (tx.c:1168)
==42444== by 0x4E6C38B: pmemobj_tx_add_range (tx.c:1352)
==42444== by 0x400D81: main (test8b.c:26)
==42444== Address: 0x7dc0550 size: 8 tx_id: 1
$ pmeminsp cb -pmem-file /mnt/pmem/file -- ./test8b
...
$ pmeminsp ca -pmem-file /mnt/pmem/file -- ./test8b
...
$ pmeminsp rp -- ./test8b
#===============================================================================
# Diagnostic # 1: Overlapping regions registered in different transactions
#-------------------
transaction
in /mnt/hgfs/workbench/pmemcheck-test/test8/test8b!main at test8b.c:24 - 0xD1E
in /mnt/hgfs/workbench/pmemcheck-test/test8/test8b!_start at
<unknown_file>:<unknown_line> - 0x9F4
protects
memory region
in /mnt/hgfs/workbench/pmemcheck-test/test8/test8b!main at test8b.c:26 - 0xD7D
in /mnt/hgfs/workbench/pmemcheck-test/test8/test8b!_start at
<unknown_file>:<unknown_line> - 0x9F4
overlaps with
memory region
in /mnt/hgfs/workbench/pmemcheck-test/test8/test8b!func at test8b.c:15 - 0xC44
in /lib64/libpthread.so.0!start_thread at <unknown_file>:<unknown_line> - 0x7DCD
in /lib64/libc.so.6!__clone at <unknown_file>:<unknown_line> - 0xFDEAB

Memoryoverwrites
• This refers to the case where multiple modifications to the same persistent
memory location occur before the location is made persistent
• This issue is mostly related to performance, although it can uncover lack of
flushing too.
• In general, it is always better to use volatile memory for short-lived data
...
persistent_data = 10;
persistent_data *= 2;
flush (&persistent_data);
...

Memoryoverwrites
$ valgrind --tool=pmemcheck --mult-stores=yes ./test2w_
==121362== Command: ./test2w_
==121362==
==121362==
==121362==
==121362== Number of overwritten stores: 1
==121362== Overwritten stores before they were made persistent:
==121362== [0] at 0x40097C: main (test2w_.c:23)
==121362== Address: 0x4023004 size: 4 state: DIRTY

Unnecessaryflushes
• Flushing should be done carefully
• Detecting unnecessary flushes (such as redundant ones) can help in
improving code performance
...
persistent_data = computation ();
...

Unnecessaryflushes
$ valgrind --tool=pmemcheck --flush-check=yes ./test3b
==54720== Command: ./test3b
==54720==
==54720==
==54720==
==54720== Number of unnecessary flushes: 1
==54720== [0] at 0x400868: flush (emmintrin.h:1459)
==54720== by 0x400989: main (test3b.c:22)
==54720== Address: 0x4023000 size: 64
$ pmeminsp rp -- ./test3w_ ./test3r_
#===============================================================================
# Diagnostic # 1: Redundant cache flush
#-------------------
Cache flush
of size 64 at address 0x7F10DD6D6000 (offset 0x0 in /mnt/pmem/file)
in /mnt/hgfs/workbench/pmemcheck-test/test3/test3w_!flush at test3w_.c:11 - 0x674
in /mnt/hgfs/workbench/pmemcheck-test/test3/test3w_!main at test3w_.c:24 - 0x73F
in /mnt/hgfs/workbench/pmemcheck-test/test3/test3w_!_start at
<unknown_file>:<unknown_line> - 0x574
is redundant with regard to
cache flush
in /mnt/hgfs/workbench/pmemcheck-test/test3/test3w_!flush at test3w_.c:11 - 0x674
in /mnt/hgfs/workbench/pmemcheck-test/test3/test3w_!main at test3w_.c:25 - 0x750
of
memory store
in /mnt/hgfs/workbench/pmemcheck-test/test3/test3w_!main at test3w_.c:23 - 0x72D

but not flushed explicitly can still be
flushed out by the CPU
• Bugs can arise when data is not
written to persistent media in the
order expected
Out-of-orderanalysis writer () {
pcounter = 0;
for (i=0; i<max; i++) {
pcounter++;
if (rand() % 2 == 0) {
pcells[i].data = data();
flush (pcells[i].data);
pcells[i].valid = True;
} else {
pcells[i].valid = False;
}
flush (pcells[i].valid);
}
flush (pcounter);
}
reader () {
for (i=0; i<pcounter; i++) {
if (pcells[i].valid == True) {
print (pcells[i].data);
}
}
}

Out-of-orderanalysis
$ pmeminsp rp -check-out-of-order-store -- ./test9w_ ./test9r_
#===============================================================================
# Diagnostic # 1: Out-of-order stores
#-------------------
Memory store
of size 4 at address 0x7F6979541000 (offset 0x0 in /mnt/pmem/file)
in /mnt/hgfs/workbench/pmemcheck-test/test9/test9w_!main at test9w_.c:30 - 0x72C
<unknown_file>:<unknown_line> - 0x5C4
is out of order with respect to
memory store
of size 1 at address 0x7F697954107F (offset 0x7F in /mnt/pmem/file)
in /mnt/hgfs/workbench/pmemcheck-test/test9/test9w_!main at test9w_.c:36 - 0x7AE
<unknown_file>:<unknown_line> - 0x5C4

Out-of-orderanalysis
$ valgrind --tool=pmemcheck -q --log-stores=yes --log-stores-stacktraces=yes --log-stores-stacktraces-depth=2 --print-summary=yes
--log-file=store_log.log ./program
$
$
$ pmreorder -l store_log.log -o output_file.log -x pmem_memset_persist=NoReorderNoCheck -r ReorderFull -c prog -p ./program_checker
$
$ cat output_file.log
WARNING:pmreorder:File /mnt/pmem/file inconsistent
WARNING:pmreorder:Call trace:
Store [0]:
by 0x401D0C: main (test9bw_.cpp:55)
• pmemcheck + pmreorder
You need a checker returning 0 if data
is consistent, or 1 otherwise
Assign an engine type to specific marker
(set by macros)
Engines: NoReorderNoCheck, NoReorderDoCheck, ReorderFull,
ReorderPartial, ReorderAccumulative, ReorderReverseAccumulative

Backup

Debugging Tools & Techniques for Persistent Memory Programming

Debugging Tools & Techniques for Persistent Memory Programming

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Debugging Tools & Techniques for Persistent Memory Programming

Similar to Debugging Tools & Techniques for Persistent Memory Programming (20)

More from Intel® Software

More from Intel® Software (20)

Recently uploaded

Recently uploaded (20)

Debugging Tools & Techniques for Persistent Memory Programming