Skip to content

Add optional CUDA graph resource inspection for AI predecoder#505

Open
wsttiger wants to merge 3 commits intoNVIDIA:mainfrom
wsttiger:add_ai_predecoder_cuda_graph_resources_output
Open

Add optional CUDA graph resource inspection for AI predecoder#505
wsttiger wants to merge 3 commits intoNVIDIA:mainfrom
wsttiger:add_ai_predecoder_cuda_graph_resources_output

Conversation

@wsttiger
Copy link
Copy Markdown
Collaborator

Add ai_predecoder_service::print_graph_resources() that walks the captured cuGraph and reports per-kernel grid/block dims, register usage, shared memory, and launch totals, plus a node-type summary.

Collection is opt-in via a new collect_resources parameter on capture_graph() because it uses the CUDA driver API to introspect TRT kernels, which perturbs primary-context state and breaks DOCA-based GPU-RoCE on the FPGA bridge. Only the software benchmark exposes a --print-graph-resources flag; the FPGA bridge ignores it and prints a warning.

Add ai_predecoder_service::print_graph_resources() that walks the
captured cuGraph and reports per-kernel grid/block dims, register
usage, shared memory, and launch totals, plus a node-type summary.

Collection is opt-in via a new collect_resources parameter on
capture_graph() because it uses the CUDA driver API to introspect TRT
kernels, which perturbs primary-context state and breaks DOCA-based
GPU-RoCE on the FPGA bridge. Only the software benchmark exposes a
--print-graph-resources flag; the FPGA bridge ignores it and prints a
warning.

Signed-off-by: Scott Thornton <wsttiger@gmail.com>
The CUDA graph resource inspection added in this PR introduced CUDA
driver API calls (cuGraphKernelNodeGetParams_v2, cuFuncGetName,
cuFuncGetAttribute) inside ai_predecoder_service.cu. The new
unittests/realtime/ targets were updated to link CUDA::cuda_driver,
but the older test_realtime_pipeline target in
libs/qec/unittests/CMakeLists.txt (which also compiles
ai_predecoder_service.cu directly) was missed, causing undefined
reference errors in the standalone QEC CI builds
(amd64 12.6, amd64 13.0, arm64 13.0).

Add CUDA::cuda_driver to test_realtime_pipeline's link libraries.

Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Decouple CUDA graph resource introspection from ai_predecoder_service
and expose it as free functions in a new graph_resources translation
unit.  Targets that want per-kernel grid/block/register/shared-memory
reporting now:

  1. call capture_graph(stream, device_launch, save_graph=true), which
     retains a cudaGraphClone of the captured template, and
  2. pass the cudaGraph_t returned by get_captured_graph() to the free
     functions collect_graph_resources() / print_graph_resources() in
     cudaq/qec/realtime/graph_resources.h.

Motivation: the driver-API calls (cuFuncGetAttribute, cuFuncGetName,
cuGraphKernelNodeGetParams_v2) required to introspect TRT-internal
kernels pulled libcuda.so.1 into every target that merely compiled
ai_predecoder_service.cu.  That broke QEC standalone CI builds whose
containers do not ship a GPU driver: test_realtime_pipeline's
gtest_discover_tests invocation failed at build time on
"libcuda.so.1: cannot open shared object file".

After this change the driver API is confined to graph_resources.cu,
which is only compiled into the benchmark target
(test_realtime_predecoder_w_pymatching).  test_realtime_pipeline and
hololink_predecoder_bridge no longer reference any cu*-prefixed
symbol and therefore no longer require CUDA::cuda_driver on their
link lines.  Verified with ldd: libcuda.so.1 is absent from
test_realtime_pipeline.

Additional cleanup:
- ai_predecoder_service no longer owns graph_resource_info and has
  no <iosfwd>/<string>/<vector> includes it does not use.
- The FPGA bridge still warns when --print-graph-resources is passed
  since driver-API introspection would perturb the CUDA context used
  by DOCA/Hololink GPU-RoCE.
- Reverts the earlier CUDA::cuda_driver link and DISCOVERY_MODE
  PRE_TEST workarounds on test_realtime_pipeline.

Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be a .cpp file since there isn't any real CUDA code in here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants