Add pod-based crank scheduler for simplified benchmark scheduling by LoopedBard3 · Pull Request #2167 · aspnet/Benchmarks

LoopedBard3 · 2026-04-29T21:33:51Z

Summary

Simplified alternative to PR #2106's full crank-scheduler. Uses a pod model where machines are fixed groups (SUT + load + DB) instead of individual machines with capability scoring and preferred partners.

Motivation

The existing manual scheduling (Excel sheet -> matrix YAML -> web tool -> pipeline YAML) is error-prone and leaves machines underutilized. PR #2106 introduced a full scheduler (~2000 lines) but its complexity (capabilities, priorities, preferred_partners, machine_groups, profile_overrides) may be more than needed given that machines are best modeled as fixed pods.

Approach

A pod is a fixed group of machines that always run together:

SUT (System Under Test) - required
Load generator - optional
DB (Database) - optional

The scheduler:

Expands scenarios x pods into runs
Sorts by runtime (longest-job-first)
Greedily packs into stages checking physical machine collisions
Splits across multiple YAML files with balanced runtime

What's Included

scripts/pod-scheduler/ - 5 Python files + README (~570 lines total)
build/benchmarks_ci_pods.json - CI config (6 pods, 84 runs)
build/benchmarks_ci_azure_pods.json - Azure config (9 pods, 26 runs, includes merged eastus2)
build/benchmarks_ci_cobalt_pods.json - Cobalt hosted config (4 pods, 44 runs)

Key Differences from PR #2106

	PR #2106	This PR
Code size	~2000 lines	~570 lines
Concepts	capabilities, priorities, preferred_partners, machine_groups, profile_overrides	pods, scenarios
Config	13 machine defs with nested capabilities	flat pod definitions

Run Count Parity with Main

Pipeline	Pod Scheduler	Main Branch
CI (01+02)	84	84
Azure	26	26
Cobalt	44	44

Simplified alternative to PR aspnet#2106's full crank-scheduler. Uses a pod model where machines are fixed groups (SUT + load + DB) instead of individual machines with capability scoring and preferred partners. Key simplifications: - Pods define fixed machine groupings (no role priority/scoring) - Shared machines between pods handled via collision detection - Same greedy longest-job-first bin-packing algorithm - Same Liquid template YAML generation - ~570 lines vs ~2000 lines in the full scheduler Includes: - scripts/pod-scheduler/ (5 Python files + README) - build/benchmarks_ci_pods.json (pod-based config for CI benchmarks) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Pod-based configurations for all three additional CI environments: - benchmarks_ci_azure_pods.json: 6 pods, 14 runs (matches main) - benchmarks_ci_azure_eastus2_pods.json: 2 pods, 12 runs (matches main) - benchmarks_ci_cobalt_pods.json: 4 pods, 44 runs (matches main) Notable pod patterns: - Azure IDNA pods cross-use each other as load machines - Cobalt hosted has 28-core variant pods sharing physical machines with full-core pods (handled by collision detection) - Azure eastus2 pods share load/db, serialized automatically Also fixes unicode bar chars for Windows compatibility. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Reflects main branch changes from PR aspnet#2166: - Merged cobalt-cloud-lin pods (eastus2) into azure config - Removed separate benchmarks_ci_azure_eastus2_pods.json - Kept IDNA pod load profiles on linux machines (load jobs require linux), reverting the main branch profile change - Added cobalt-cloud-lin-azl3-dual pod for type-2 scenarios (uses cobalt-cloud-lin-db as load instead of client) - Total runs: 26 (matches main azure pipeline) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Generated via: python ./scripts/pod-scheduler/main.py --config ./build/benchmarks_ci_pods.json --template ./build/benchmarks.template.liquid --yaml-output ./build python ./scripts/pod-scheduler/main.py --config ./build/benchmarks_ci_azure_pods.json --template ./build/benchmarks.template.liquid --yaml-output ./build --base-name benchmarks-ci-azure python ./scripts/pod-scheduler/main.py --config ./build/benchmarks_ci_cobalt_pods.json --template ./build/benchmarks.template.liquid --yaml-output ./build --base-name benchmarks-ci-cobalt Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Adds a new, simplified “pod-based” benchmark scheduler that expands scenario×pod runs, greedily packs them into non-conflicting parallel stages, and generates Azure DevOps pipeline YAML(s) from JSON configs.

Changes:

Introduces scripts/pod-scheduler/ (models, config loading, scheduling, YAML generation, CLI + README).
Adds pod-based benchmark configurations for CI, Azure, and Cobalt hosted pipelines under build/*_pods.json.
Implements schedule splitting into multiple YAML files with staggered cron offsets.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
scripts/pod-scheduler/scheduler.py	Core greedy longest-job-first stage packing + schedule splitting.
scripts/pod-scheduler/models.py	Pod/scenario/run/stage/schedule data model definitions.
scripts/pod-scheduler/main.py	CLI entry point + human-readable summaries (utilization, stages, conflicts).
scripts/pod-scheduler/generator.py	Converts schedules to template-shaped data and emits pipeline YAML.
scripts/pod-scheduler/config_loader.py	Loads pod scheduler JSON config into typed models.
scripts/pod-scheduler/README.md	Documents pod model, config format, and usage.
build/benchmarks_ci_pods.json	New CI pod/scenario configuration (84 runs).
build/benchmarks_ci_azure_pods.json	New Azure pod/scenario configuration (26 runs).
build/benchmarks_ci_cobalt_pods.json	New Cobalt hosted pod/scenario configuration (44 runs).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Formula is now max(120, min(240, 2 * estimated_runtime)). This prevents scenarios with long runtimes (e.g. Proxies at 150min) from setting unreasonably high timeouts compared to previous values. Resulting timeouts: 120 (default), 140 (Grpc), 180 (PGO/Containers), 240 (Proxies) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Fix 4 incorrect template filenames in benchmarks_ci_pods.json: crossgen-scenarios -> crossgen2-scenarios, custom-proxies-scenarios -> proxies-custom-scenarios, single-file-scenarios -> singlefile-scenarios, websockets-scenarios -> websocket-scenarios - Fix machine utilization calculation bug (was inflating totals for machines not in current stage) - Remove unused imports (sys, Any, Dict, json, Pod) - Remove dead render_with_liquid function and --template CLI arg - Add guard against empty queues (ZeroDivisionError) - Update README and docstrings to reflect removed template arg Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Code: - Validate cron schedules at load time and raise on unsupported hour fields instead of silently no-op'ing the offset for split YAMLs - Add optional 'timeout' override per scenario; fall back to the runtime-derived formula when absent - Move pipeline plumbing (pool, service-bus connection/namespace) into JSON metadata.pipeline with the previous hardcoded values as defaults - Strict validation of duplicate pods, duplicate scenario.pods entries, empty queues; default scheduler to fail-fast on unknown/invalid pod references with a --lenient opt-out - Stricter job-id sanitization (handles '.', '/', parens, leading digits, unicode) and explicit duplicate detection in generated YAML - Replace id(stage) bookkeeping in split_schedule with explicit indices; add stable name tie-breaker to create_schedule for deterministic output - Use Run.job_name in the generator instead of duplicating the regex - Drop stale '--template' arg from generated YAML headers and README Tests: - 41 unit + snapshot tests covering models, config loader, scheduler, generator, and YAML parity with the committed *_pods.json configs Cleanup: - Revert benchmarks.template.liquid and benchmarks_ci_azure.json to main; the deleted crank-scheduler does not consume them - Regenerate all four pipeline YAMLs against the new generator Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The Liquid template was only consumed by the deleted crank-scheduler. The pod-scheduler renders pipeline YAML directly via Python, and grep confirms no other script, pipeline, or build step reads this file. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

These were artifacts of the old hand-driven matrix.yml -> json -> Liquid template -> benchmarks.yml workflow. Their only inbound references were stale documentation comments cross-pointing between each other; nothing in the repo (no script, no pipeline) consumed them. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Generated YAML headers now embed the exact regen command (with the source config and base name) and a pointer to scripts/pod-scheduler/README.md, so each file documents how to reproduce itself - New build/README.md maps each *_pods.json config to the YAML it produces, lists the hand-maintained scenario templates, and explains the typical edit/regenerate workflow - Top-level README.md gains a 'Continuous benchmarking pipelines' section linking to the pod-scheduler and build/ docs - pod-scheduler README's Quick Start now uses repo-root-relative commands and points at the snapshot tests for verification - Tests cover the new _format_source_path helper and the snapshot test passes the source config so headers stay verified Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

benchmarks_ci.json, benchmarks_ci_azure.json, and benchmarks_ci_cobalt.json used the old 'machines + capabilities' format consumed by the deleted crank-scheduler. Their replacements (benchmarks_ci_pods.json, benchmarks_ci_azure_pods.json, benchmarks_ci_cobalt_pods.json) drive the pod-scheduler. grep finds zero inbound references for any of the three across scripts, pipelines, docs, and tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

LoopedBard3 and others added 3 commits April 29, 2026 14:21

LoopedBard3 self-assigned this Apr 29, 2026

LoopedBard3 requested a review from Copilot April 29, 2026 21:34

Copilot started reviewing on behalf of LoopedBard3 April 29, 2026 21:35 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

LoopedBard3 and others added 7 commits April 29, 2026 14:56

LoopedBard3 marked this pull request as ready for review May 1, 2026 18:18

LoopedBard3 requested review from DrewScoggins and sebastienros May 1, 2026 19:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pod-based crank scheduler for simplified benchmark scheduling#2167

Add pod-based crank scheduler for simplified benchmark scheduling#2167
LoopedBard3 wants to merge 11 commits intoaspnet:mainfrom
LoopedBard3:pod-scheduler

LoopedBard3 commented Apr 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LoopedBard3 commented Apr 29, 2026

Summary

Motivation

Approach

What's Included

Key Differences from PR #2106

Run Count Parity with Main

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants