Handle NVIDIA UMA memory fallback#466
Merged
Syllo merged 1 commit intoSyllo:masterfrom Apr 29, 2026
Merged
Conversation
Owner
|
Thanks a lot |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR improves NVIDIA UMA/iGPU memory reporting for platforms where NVML does not expose dedicated framebuffer memory. On these systems NVML can return
NVML_ERROR_NOT_SUPPORTEDfornvmlDeviceGetMemoryInfo*()because the GPU shares system memory instead of having a dedicated framebuffer.The concrete hardware used to reproduce and validate this change is:
Problem
On DGX Spark / GB10,
nvidia-smireports framebuffer memory as unsupported:Before this change, nvtop detected the unified-memory case but only populated
total_memoryfrom/proc/meminfo. As a result, the UI still had incomplete memory reporting and the overall memory meter could not show the available/used UMA memory state.Changes
/proc/meminfowhen NVML reports unified memory / unsupported framebuffer memory.total_memoryfromMemTotalfree_memoryfromMemAvailableused_memoryasMemTotal - MemAvailablemem_util_ratefrom the same valuesused_memorywas added to itself before adding each process usage.This keeps the change scoped to the existing UMA path and does not alter normal discrete NVIDIA GPU memory reporting.
Validation
Built and tested on the DGX Spark / GB10 machine:
nvtop --snapshotafter the change:Interactive UI was also tested on the DGX Spark desktop session. The memory meter changed from
N/Ato a UMA system memory value, for example:Local build validation was also run on an x86_64 NVIDIA system with discrete RTX 3060 GPUs to ensure the normal NVML framebuffer path still builds and reports discrete GPU memory through NVML.
Not addressed / expected remaining N/A fields
This PR intentionally does not attempt to synthesize metrics that NVML and
nvidia-smistill do not expose on DGX Spark / GB10. On the tested machine these remainN/Ainnvidia-smias well:Those are left unchanged because they require driver/platform telemetry support or another documented data source. This PR only addresses the UMA memory reporting case where NVIDIA documentation recommends estimating memory resources from Linux system memory counters rather than relying on framebuffer memory.