Track out of memory

When speaking about memory in HPC, one of the main issue which can arise is being killed on the computation node due to beeing out of memory.

MALT cannot be triggered at the exect moment of the out of memory being triggered. This, because at that moment the application (and the instrumentation) tool cannot do anything more than being killed.

But MALT can trigger a dump of the profile a little bit before by playing with some software threasholds so you can in post-mortem analyse the profile as usual.

Available options

You can play with the options from the dump group of options.

Table of metrics

Option

Short description

on-signal

Dump on signal. Can be comma separated list from SIGINT, SIGUSR1,

after-seconds

Dump after X seconds (limited to only one time)

on-sys-full-at

Dump when system memory become full at x%, xG, xM, xK, x (empty to disable).

on-app-using-rss

Dump when RSS of the app reach the given limit in %, G, M, K (empty to disable).

on-app-using-virt

Dump when Virtual Memory of the app reach limit in %, G, M, K (empty to disable).

on-app-using-req

Dump when Requested Memory of the app reach limit in %, G, M, K (empty to disable).

on-thread-stack-using

Dump when one stack reach limit in %, G, M, K (empty to disable).

on-alloc-count

Dump when number of allocations reach limit in G, M, K (empty to disable).

watch-dog

Run an active thread spying continuouly the memory of the app, not only sometimes.

Look in the profile

When looking in the profile, you can get the memory used a peak time with the metric Global Peak Memory.