feat: Add Alpamayo end-to-end Expert diffusion support by Turoad · Pull Request #67 · NVIDIA/TensorRT-Edge-LLM

Turoad · 2026-04-13T12:19:24Z

What does this PR do?

Type of change: New feature

Overview: Add end-to-end TensorRT inference for Alpamayo 1.5 (10B VLM + Diffusion Expert) autonomous driving model. After VLM decode produces a KV cache, the Expert runner performs 10-step flow-matching Euler integration to generate 6 candidate trajectories — all on GPU without host round-trips.

New files

AlpamayoExpertRunner (cpp/runtime/alpamayoExpertRunner.h/.cpp): Loads Expert TRT engine, manages GPU buffers (KV reshape, noisy action, timestep, attention mask), runs 10-step Euler diffusion
CUDA kernels (cpp/kernels/alpamayoExpertKernels/): kvCacheReshapeRepeat, buildPositionIds, fillTimestep, eulerUpdate

Modified files

llmInferenceRuntime: Expert integration after VLM decode, StopAfterEOS for <traj_future_start> token, single-seq and multi-seq candidate modes, KV cache dump for offline validation
llm_inference.cpp: CLI flags --expertEngine, --numCandidates, --numDiffusionSteps, --multiSeq, --dumpKVCache
QwenViTRunner: preprocessPreparedVisual() for pre-processed multi-camera images (bypasses runtime image decoding)
CMakeLists: curand linkage, SM 110 (Jetson Thor) architecture

Validation

Validated on Jetson AGX Thor with 300 samples:

Config	mean minADE	median minADE	Steady-state latency
FP8 VLM + BF16 Expert 6x (recommended)	0.799m	0.637m	~2.83s/sample
PyTorch reference	0.827m	0.700m	3.62s/sample

Usage

llm_inference \
  --engineDir <vlm_engine> \
  --multimodalEngineDir <visual_engine> \
  --expertEngine <expert_trt_engine> \
  --numCandidates 6 \
  --numDiffusionSteps 10 \
  --inputFile input.json \
  --outputFile output.json

🚀 Pull Request Checklist

✅ Pre-commit Checks

Code formatted with clang-format (style=file)
codespell passed (0 errors)
License headers added (SPDX Apache-2.0)

🧪 Tests

Compiled and tested on Jetson AGX Thor (ARM64, CUDA 12.8, TRT 10.x)
300-sample precision validation (minADE aligned with PyTorch reference)
Speed validation (no regression vs baseline)

📄 Documentation

CLI flags documented in commit message

⚙️ Compatibility

Backward compatible — Expert features are opt-in via --expertEngine flag
No changes to existing inference paths when Expert is not configured

Additional Information

Related issue: #32

Add end-to-end TensorRT inference for Alpamayo 1.5 (10B VLM + Diffusion Expert) autonomous driving model. After VLM decode produces a KV cache, the Expert runner performs 10-step flow-matching Euler integration to generate 6 candidate trajectories — all on GPU without host round-trips. New files: - AlpamayoExpertRunner: loads Expert TRT engine, manages GPU buffers (KV reshape, noisy action, timestep, attention mask), runs diffusion - CUDA kernels: kvCacheReshapeRepeat, buildPositionIds, fillTimestep, eulerUpdate Modified files: - llmInferenceRuntime: Expert integration after VLM decode, StopAfterEOS for traj_future_start token, single-seq and multi-seq candidate modes, KV cache dump for offline validation - llm_inference.cpp: CLI flags --expertEngine, --numCandidates, --numDiffusionSteps, --multiSeq, --dumpKVCache - QwenViTRunner: preprocessPreparedVisual for pre-processed multi-camera images (bypasses runtime image decoding) - CMakeLists: curand linkage, SM 110 (Jetson Thor) architecture Validated on Jetson AGX Thor with 300 samples: FP8 VLM + BF16 Expert 6x steady-state ~2.83s/sample, mean minADE 0.799m, median 0.637m (aligned with PyTorch reference mean 0.827m, median 0.700m). Signed-off-by: thor <thor@nvidia.com>

nvluxiaoz · 2026-04-19T19:51:12Z

Thanks a lot for this MR! The core team is also working on supporting Alpamayo in a new release. We will take this MR as a reference and properly cite this great contribution.

Turoad · 2026-04-21T02:39:40Z

Thanks for the response! Glad to hear the core team is also working on Alpamayo support.

I'd love to collaborate rather than have parallel efforts — a few thoughts:

This PR is validated: 300-sample precision benchmark on Jetson AGX Thor, steady-state 2.83s/sample, aligned with PyTorch reference. Happy to share the full test harness and dataset scripts if helpful.
Happy to adapt: If the core team's implementation has a different architecture or API design, I'm glad to refactor this PR to align with your internal conventions — just let me know the target structure.
Incremental merge? If the full PR is too large to review at once, I can split it into smaller pieces (e.g., kernels first, then ExpertRunner, then integration).

What would make this most useful for the team — merge as-is, adapt to your internal branch, or contribute specific pieces? I'm flexible on the path, just want to make sure the validated work doesn't go to waste.

@nvluxiaoz

genie-ahughes · 2026-05-03T03:56:36Z

@Turoad I'd love the test harness and dataset scripts you offered above. Trying to reproduce your end-to-end pipeline on a Jetson AGX Thor against nvidia/Alpamayo-R1-10B. Anywhere you can drop them (gist, branch on the fork, attachment) would be hugely appreciated — particularly the action-expert ONNX export script so the engine I/O matches your alpamayoExpertRunner.cpp contract (single fused kv_cache binding etc.). Thanks!

Turoad requested a review from a team April 13, 2026 12:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Alpamayo end-to-end Expert diffusion support#67

feat: Add Alpamayo end-to-end Expert diffusion support#67
Turoad wants to merge 1 commit intoNVIDIA:mainfrom
Turoad:feature/alpamayo-expert-e2e

Turoad commented Apr 13, 2026

Uh oh!

nvluxiaoz commented Apr 19, 2026

Uh oh!

Turoad commented Apr 21, 2026

Uh oh!

genie-ahughes commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Turoad commented Apr 13, 2026

What does this PR do?

New files

Modified files

Validation

Usage

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

📄 Documentation

⚙️ Compatibility

Additional Information

Uh oh!

nvluxiaoz commented Apr 19, 2026

Uh oh!

Turoad commented Apr 21, 2026

Uh oh!

genie-ahughes commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants