What to explore with NUMAPROF

Code annotation

NUMAPROF provides code annotations to map allocation metrics directly onto your source code.

Thread statistics

You get some statistics about memory accesses over threads.

Thread pinning history

Track all the thread pinning history and statistics to better understand your profile.

Access matrix

Profide a global and per thread access matrix to better understand the global balancing over your nodes.

Report on allocation site

Report remote/local accesses on access site and on allocation site to quickly find segments generating remote acceses.

Per-line details

When you put your mouse on access line you get all the details on this line with charts.

Assembly annotation

In addition to the source code annotation NUMAPROF annotate the assembly code to understand accesses at fine grain.

KNL MCDRAM

NUMAPROF provide a special instrumentation to track Intel KNL MCDRAM accesses.

Track thread and page pinning

NUMAPROF track pinning of threads and pages to detect unpinned (undefined) behavior which can be improved.

Kcachegrind

NUMAPROF can convert is native output format to callgrind compatible format.

About NUMAPROF

Web-based GUI

NUMAPROF uses a web-based GUI hosted by an embedded server in python. The GUI uses JS libraries like JQuery and D3JS.

JSON based format

NUMAPROF uses a JSON file format, so might be reused without too much effort by other software. The drawback is that it produces bigger files.

Ignore local stack

To get faster NUMAPROF do no instrument the local stack accesses, this largely reduce the overhead.

Track first touch

NUMAPROF track the first touch which determines the page location and report the in the profile.

Native parallel execution

Unlike valgrind and thanks to pintool, NUMAPROF is parallel and only be slow down by the binary instrumentation.

40x overhead

The typical overhead of NUMAPROF is 40x depending on how much you access your stack or heap.