Skip to content

chore: version selection benchmarking#1064

Open
adityachoudhari26 wants to merge 1 commit intomainfrom
version-selector-benchmarking
Open

chore: version selection benchmarking#1064
adityachoudhari26 wants to merge 1 commit intomainfrom
version-selector-benchmarking

Conversation

@adityachoudhari26
Copy link
Copy Markdown
Member

@adityachoudhari26 adityachoudhari26 commented Apr 24, 2026

resolves #1062

Summary by CodeRabbit

  • Tests
    • Added performance benchmarks for internal selector evaluation mechanisms to measure execution efficiency and compile-time costs.

Copilot AI review requested due to automatic review settings April 24, 2026 23:31
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

📝 Walkthrough

Walkthrough

A benchmark test file is introduced for the versionselector package to measure CEL selector performance, comparing compiled CEL program execution against native Go equality checks and evaluating compilation costs with deterministic synthetic deployment metadata.

Changes

Cohort / File(s) Summary
Benchmark Tests
apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/versionselector/metadata_selector_bench_test.go
New benchmark file containing three benchmarks: BenchmarkMetadataSelector_Eval measures steady-state CEL program execution, BenchmarkMetadataSelector_NativeEq provides a hand-written Go baseline, and BenchmarkMetadataSelector_Compile evaluates compilation overhead. Includes helper function compileUncached and pre-generates deterministic per-version contexts with configurable corpus and metadata map sizes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 A rabbit hops through benchmarks bright,
Testing CEL at rapid flight,
Native code and compiled spells,
Measuring speed where performance dwells! ⚡

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change—introducing benchmarking for version selection with a clear, concise description.
Linked Issues check ✅ Passed The PR implements CEL selector performance benchmarks (BenchmarkMetadataSelector_Eval, BenchmarkMetadataSelector_Compile, BenchmarkMetadataSelector_NativeEq) that directly address the objective in #1062 to create deployment version CEL filtering benchmarks.
Out of Scope Changes check ✅ Passed All changes are within scope—the benchmark file and helper functions are directly related to implementing the CEL filtering performance benchmarks requested in #1062.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch version-selector-benchmarking

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Go benchmark suite to measure CEL-based deployment version metadata selector performance, addressing issue #1062.

Changes:

  • Introduces benchmarks for steady-state CEL evaluation across selector shapes, corpus sizes, and metadata map sizes.
  • Adds a native Go baseline benchmark for the env == "prod" check to estimate CEL overhead.
  • Adds a compile-only benchmark intended to measure compilation cost without program-cache hits.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +58 to +60
for i := range n {
meta := make(map[string]any, mapSize)

Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for i := range n is invalid because range can’t iterate over an int. This won’t compile; use an indexed loop to fill contexts (e.g., increment i from 0..n-1).

Copilot uses AI. Check for mistakes.
Comment on lines +77 to +79
for k := range mapSize - len(meta) {
meta[fmt.Sprintf("filler_%d", k)] = fmt.Sprintf("value_%d", r.Intn(1_000_000))
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for k := range mapSize - len(meta) is invalid (range can’t iterate over an int, and mapSize - len(meta) is an int expression). This prevents the benchmark from compiling; switch to a standard counted loop, and guard against mapSize < len(meta) if that’s possible.

Copilot uses AI. Check for mistakes.
for i := 0; i < b.N; i++ {
matches = 0
for _, ctx := range contexts {
ok, _ := evaluate(program, ctx)
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmark ignores the error returned by evaluate(program, ctx). If evaluation fails (e.g., due to a non-missing-key runtime error), the benchmark will silently treat it as a non-match and continue, which can invalidate results. Consider failing the benchmark on non-nil error (or at least counting/reporting errors).

Suggested change
ok, _ := evaluate(program, ctx)
ok, err := evaluate(program, ctx)
if err != nil {
b.Fatalf("evaluate %q: %v", shape.label, err)
}

Copilot uses AI. Check for mistakes.
Comment on lines +170 to +171
// ristretto cache in compiledEnv by building a fresh env per run. Useful to
// size cache-miss impact separately from the steady-state Eval cost.
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment claims BenchmarkMetadataSelector_Compile bypasses the cache “by building a fresh env per run”, but compileUncached uses compiledEnv.Env() which returns the existing underlying *cel.Env (no new env is created). Update the comment to reflect it only bypasses the ristretto program cache, or create a new env per iteration if that’s what you want to measure.

Suggested change
// ristretto cache in compiledEnv by building a fresh env per run. Useful to
// size cache-miss impact separately from the steady-state Eval cost.
// ristretto program cache in compiledEnv. Useful to size cache-miss impact
// separately from the steady-state Eval cost.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/versionselector/metadata_selector_bench_test.go`:
- Around line 112-116: The benchmark currently ignores errors from evaluate (ok,
_ := evaluate(...)), which can hide failures; change the call to capture the
error (ok, err := evaluate(program, ctx)) and handle it (e.g., b.Fatalf or
b.Errorf + continue) so the benchmark fails loudly on evaluator errors; keep the
matches increment logic only when ok is true and include the error in the
failure message to aid debugging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 07deb073-a2c8-4241-96e1-a0dfe3758a83

📥 Commits

Reviewing files that changed from the base of the PR and between 451e4f6 and 91506e5.

📒 Files selected for processing (1)
  • apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/versionselector/metadata_selector_bench_test.go

Comment on lines +112 to +116
for i := 0; i < b.N; i++ {
matches = 0
for _, ctx := range contexts {
ok, _ := evaluate(program, ctx)
if ok {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle evaluator errors instead of dropping them.

At Line 115, the benchmark discards evaluate errors (ok, _ := ...). If eval starts failing (type changes, env changes, expression edge cases), the benchmark silently reports misleading throughput.

Proposed fix
-						ok, _ := evaluate(program, ctx)
+						ok, err := evaluate(program, ctx)
+						if err != nil {
+							b.Fatalf("evaluate shape=%q failed: %v", shape.label, err)
+						}
 						if ok {
 							matches++
 						}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for i := 0; i < b.N; i++ {
matches = 0
for _, ctx := range contexts {
ok, _ := evaluate(program, ctx)
if ok {
for i := 0; i < b.N; i++ {
matches = 0
for _, ctx := range contexts {
ok, err := evaluate(program, ctx)
if err != nil {
b.Fatalf("evaluate shape=%q failed: %v", shape.label, err)
}
if ok {
matches++
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/versionselector/metadata_selector_bench_test.go`
around lines 112 - 116, The benchmark currently ignores errors from evaluate
(ok, _ := evaluate(...)), which can hide failures; change the call to capture
the error (ok, err := evaluate(program, ctx)) and handle it (e.g., b.Fatalf or
b.Errorf + continue) so the benchmark fails loudly on evaluator errors; keep the
matches increment logic only when ok is true and include the error in the
failure message to aid debugging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: deployment version CEL filtering benchmarks

2 participants