Replace identical data test with precise delta validation test by asamal4 · Pull Request #171 · lightspeed-core/lightspeed-evaluation

asamal4 · 2026-02-25T11:19:52Z

Original PR: #160
Additionally fixed lint issue, restored relative change assert & added statistical non-significance assert

Replaced test_compare_score_distributions_identical_data with test_compare_score_distributions_precise_delta to provide more meaningful validation.
The new test validates behavior with a precise 0.001 mean difference instead of identical values, using reasonable floating-point tolerance (1e-6) and verifying mean calculations, relative change percentage, and that statistical tests correctly report non-significance for small differences with small sample sizes.

This change improves test coverage by validating real-world scenarios where differences are measurable but not statistically significant, rather than testing the trivial case of identical data where difference is exactly zero.

Description

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Tests
- Improved test coverage for evaluation comparisons by switching to a precise small-delta scenario. Tests now validate detection and reporting of a 0.001 mean shift while confirming statistical non-significance, increasing confidence in calculation accuracy for subtle differences.

Replaced test_compare_score_distributions_identical_data with test_compare_score_distributions_precise_delta to provide more meaningful validation. The new test validates behavior with a precise 0.001 mean difference instead of identical values, using reasonable floating-point tolerance (1e-6) and verifying mean calculations, relative change percentage, and that statistical tests correctly report non-significance for small differences with small sample sizes. This change improves test coverage by validating real-world scenarios where differences are measurable but not statistically significant, rather than testing the trivial case of identical data where difference is exactly zero.

coderabbitai · 2026-02-25T11:28:34Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6675a21 and d7f22fe.

📒 Files selected for processing (1)

tests/script/test_compare_evaluations.py

Walkthrough

A unit test in evaluation comparison tests was renamed and refactored: from validating identical score distributions to validating a precise delta (scores2 = scores1 + 0.001). Assertions now check computed means, mean_difference, and relative_change against expected values with approximate tolerances and non-significance of statistical tests.

Changes

Cohort / File(s)	Summary
Test Method Refactor `tests/script/test_compare_evaluations.py`	Renamed `test_compare_score_distributions_identical_data` to `test_compare_score_distributions_precise_delta` (applied to two occurrences) and replaced the identical-data scenario with a delta-based scenario. Updated test data to use a 0.001 offset, recomputed expected means/difference, and adjusted assertions to verify mean values, mean_difference, relative_change, and non-significance of statistical tests with approximate matching.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and accurately describes the main change: replacing a test method that validates identical data with one that validates precise delta (0.001 difference).
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/script/test_compare_evaluations.py`:
- Around line 210-222: Add assertions that validate the relative change and
non-significance: compute expected_relative = expected_diff / expected_mean1 and
assert result["relative_change"] == pytest.approx(expected_relative), and assert
the test result reports non-significance by checking the boolean flag (e.g.,
result["significant"] is False) or the explicit key your code uses (e.g.,
result["statistically_significant"] is False); place these after the existing
mean and mean_difference assertions to complete the scenario for result (keys:
"relative_change" and the significance flag).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6441d75 and 6675a21.

📒 Files selected for processing (1)

tests/script/test_compare_evaluations.py

VladimirKadlec

LGTM

…speed-core#171) * Replace identical data test with precise delta validation test Replaced test_compare_score_distributions_identical_data with test_compare_score_distributions_precise_delta to provide more meaningful validation. The new test validates behavior with a precise 0.001 mean difference instead of identical values, using reasonable floating-point tolerance (1e-6) and verifying mean calculations, relative change percentage, and that statistical tests correctly report non-significance for small differences with small sample sizes. This change improves test coverage by validating real-world scenarios where differences are measurable but not statistically significant, rather than testing the trivial case of identical data where difference is exactly zero. * fix lint issues * add rel_change and stat significance asserts --------- Co-authored-by: Priscila Gutierres <prgutier@redhat.com>

Priscila Gutierres and others added 2 commits February 25, 2026 15:34

fix lint issues

6675a21

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

Comment thread tests/script/test_compare_evaluations.py

add rel_change and stat significance asserts

d7f22fe

VladimirKadlec approved these changes Feb 25, 2026

View reviewed changes

asamal4 mentioned this pull request Feb 25, 2026

Replace identical data test with precise delta validation test #160

Closed

15 tasks

asamal4 merged commit 720e184 into lightspeed-core:main Feb 25, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace identical data test with precise delta validation test #171

Replace identical data test with precise delta validation test #171
asamal4 merged 3 commits intolightspeed-core:mainfrom
asamal4:pr-160-fix

asamal4 commented Feb 25, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 25, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

VladimirKadlec left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

asamal4 commented Feb 25, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

VladimirKadlec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

asamal4 commented Feb 25, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 25, 2026 •

edited

Loading