Skip to content

Update MiniMax M2.5 FP8 H200 vLLM agg recipes#1298

Open
anish-shanbhag wants to merge 1 commit intoSemiAnalysisAI:mainfrom
anish-shanbhag:ashan/port-inferencemax-53-minimax-h200-no-slurm-shared
Open

Update MiniMax M2.5 FP8 H200 vLLM agg recipes#1298
anish-shanbhag wants to merge 1 commit intoSemiAnalysisAI:mainfrom
anish-shanbhag:ashan/port-inferencemax-53-minimax-h200-no-slurm-shared

Conversation

@anish-shanbhag
Copy link
Copy Markdown

@anish-shanbhag anish-shanbhag commented May 7, 2026

Update MiniMax-M2.5 FP8 H200 vLLM to vllm/vllm-openai:v0.20.1-ubuntu2404

Set vLLM serving knobs in benchmarks/single_node/minimaxm2.5_fp8_h200.sh: generated benchmark max-model-len, previous eval max-model-len handling, fp8 KV cache, FlashInfer attention/autotune, Triton MoE, and MiniMax QK norm fusion.

@anish-shanbhag anish-shanbhag marked this pull request as ready for review May 8, 2026 23:48
@anish-shanbhag anish-shanbhag requested a review from a team May 8, 2026 23:48
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@anish-shanbhag anish-shanbhag force-pushed the ashan/port-inferencemax-53-minimax-h200-no-slurm-shared branch from e601235 to d601a12 Compare May 9, 2026 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant