Handle NVIDIA UMA memory fallback by redyuan43 · Pull Request #466 · Syllo/nvtop

redyuan43 · 2026-04-24T11:10:07Z

Summary

This PR improves NVIDIA UMA/iGPU memory reporting for platforms where NVML does not expose dedicated framebuffer memory. On these systems NVML can return NVML_ERROR_NOT_SUPPORTED for nvmlDeviceGetMemoryInfo*() because the GPU shares system memory instead of having a dedicated framebuffer.

The concrete hardware used to reproduce and validate this change is:

NVIDIA DGX Spark
GPU: NVIDIA GB10
Driver: 580.142
CUDA: 13.0
Architecture: unified memory / iGPU-style platform

Problem

On DGX Spark / GB10, nvidia-smi reports framebuffer memory as unsupported:

FB Memory Usage
    Total : N/A
    Used  : N/A
    Free  : N/A

Before this change, nvtop detected the unified-memory case but only populated total_memory from /proc/meminfo. As a result, the UI still had incomplete memory reporting and the overall memory meter could not show the available/used UMA memory state.

Changes

Add a small NVIDIA backend fallback that reads /proc/meminfo when NVML reports unified memory / unsupported framebuffer memory.
Populate:
- total_memory from MemTotal
- free_memory from MemAvailable
- used_memory as MemTotal - MemAvailable
- mem_util_rate from the same values
Fix the generic process-memory fallback accumulation bug where used_memory was added to itself before adding each process usage.

This keeps the change scoped to the existing UMA path and does not alter normal discrete NVIDIA GPU memory reporting.

Validation

Built and tested on the DGX Spark / GB10 machine:

build/src/nvtop: ELF 64-bit LSB pie executable, ARM aarch64
nvtop version 3.3.2

nvtop --snapshot after the change:

"device_name": "NVIDIA GB10"
"mem_util": "10%"
"mem_total": "130595311616"
"mem_used": "13716254720"
"mem_free": "116879056896"

Interactive UI was also tested on the DGX Spark desktop session. The memory meter changed from N/A to a UMA system memory value, for example:

13.045Gi/121.626Gi

Local build validation was also run on an x86_64 NVIDIA system with discrete RTX 3060 GPUs to ensure the normal NVML framebuffer path still builds and reports discrete GPU memory through NVML.

Not addressed / expected remaining N/A fields

This PR intentionally does not attempt to synthesize metrics that NVML and nvidia-smi still do not expose on DGX Spark / GB10. On the tested machine these remain N/A in nvidia-smi as well:

Fan Speed              : N/A
Tx Throughput          : N/A
Rx Throughput          : N/A
Memory Clock           : N/A
Power Limit            : N/A

Those are left unchanged because they require driver/platform telemetry support or another documented data source. This PR only addresses the UMA memory reporting case where NVIDIA documentation recommends estimating memory resources from Linux system memory counters rather than relying on framebuffer memory.

Syllo · 2026-04-29T05:36:59Z

Thanks a lot

Handle NVIDIA UMA memory fallback

fe7b793

Syllo merged commit 4c56f89 into Syllo:master Apr 29, 2026
3 checks passed

Syllo mentioned this pull request Apr 29, 2026

Fix NVML memory reporting regression on coherent UMA platforms (Fixes… #463

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle NVIDIA UMA memory fallback#466

Handle NVIDIA UMA memory fallback#466
Syllo merged 1 commit intoSyllo:masterfrom
redyuan43:fix-dgx-spark-uma-memory

redyuan43 commented Apr 24, 2026

Uh oh!

Uh oh!

Syllo commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

redyuan43 commented Apr 24, 2026

Summary

Problem

Changes

Validation

Not addressed / expected remaining N/A fields

Uh oh!

Uh oh!

Syllo commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants