Skip to content

Add SP benchmarks for GPT-OSS-120B MoE model (GEMM+RS, RS+RMSNorm, E2E)#513

Open
aamarnat wants to merge 2 commits intomainfrom
aamarnat/sp_benchmarks
Open

Add SP benchmarks for GPT-OSS-120B MoE model (GEMM+RS, RS+RMSNorm, E2E)#513
aamarnat wants to merge 2 commits intomainfrom
aamarnat/sp_benchmarks

Conversation

@aamarnat
Copy link
Copy Markdown
Collaborator

@aamarnat aamarnat commented Apr 21, 2026

Summary

Add three Sequence Parallelism (SP) benchmarks targeting the GPT-OSS-120B MoE model. These cover the SP segment between attention output and MoE input:

O_proj (row-parallel): [M, K_local] x [K_local, N] → partial [M, N]
  → Reduce-Scatter → [M/tp, N]
  → RMSNorm on [M/tp, N]
  → (MoE receives [M/tp, N] directly — handles its own AG at exit)

K = 64 × 64 = 4096, N = 2880. M values: 32 (decode), 896 (hybrid), 2048 (prefill).

New files

  • benchmark/ops/bench_matmul_reduce_scatter_stages.py — Fused GEMM + Reduce-Scatter stage profiler. Compares unfused torch.mm + dist.reduce_scatter_tensor against iris matmul_reduce_scatter with tile-config sweep (bm × bn). Models the O_proj row-parallel GEMM followed by RS.

  • benchmark/ops/bench_rs_rmsnorm.py — RS + RMSNorm stage profiler. Compares unfused NCCL reduce_scatter_tensor + aiter Triton RMSNorm against an iris shmem "fused-ready" variant (same ops but buffers allocated in iris shared memory, compatible with aiter fused kernel calling convention).

  • benchmark/ops/bench_sp_layer_e2e.py — End-to-end SP segment benchmark combining all three stages (GEMM + RS + RMSNorm). Compares unfused torch.mm + dist.reduce_scatter_tensor + RMSNorm against fused iris matmul_reduce_scatter + RMSNorm, with tile-config sweep.

All three benchmarks use the iris.bench framework, sweep across TP degrees (2, 4, 8 ranks), and report FLOPs and communication bytes.

Test plan

  • Run benchmark/ops/bench_matmul_reduce_scatter_stages.py with 2, 4, 8 ranks
  • Run benchmark/ops/bench_rs_rmsnorm.py with 2, 4, 8 ranks
  • Run benchmark/ops/bench_sp_layer_e2e.py with 2, 4, 8 ranks

@github-actions github-actions Bot added in-progress We are working on it iris Iris project issue labels Apr 21, 2026
aamarnat and others added 2 commits April 21, 2026 20:27
@aamarnat aamarnat force-pushed the aamarnat/sp_benchmarks branch from 30878cf to 0dae5f3 Compare April 21, 2026 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

in-progress We are working on it iris Iris project issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants