Skip to content

Replace identical data test with precise delta validation test #171

Merged
asamal4 merged 3 commits intolightspeed-core:mainfrom
asamal4:pr-160-fix
Feb 25, 2026
Merged

Replace identical data test with precise delta validation test #171
asamal4 merged 3 commits intolightspeed-core:mainfrom
asamal4:pr-160-fix

Conversation

@asamal4
Copy link
Copy Markdown
Collaborator

@asamal4 asamal4 commented Feb 25, 2026

Original PR: #160
Additionally fixed lint issue, restored relative change assert & added statistical non-significance assert

Replaced test_compare_score_distributions_identical_data with test_compare_score_distributions_precise_delta to provide more meaningful validation.
The new test validates behavior with a precise 0.001 mean difference instead of identical values, using reasonable floating-point tolerance (1e-6) and verifying mean calculations, relative change percentage, and that statistical tests correctly report non-significance for small differences with small sample sizes.

This change improves test coverage by validating real-world scenarios where differences are measurable but not statistically significant, rather than testing the trivial case of identical data where difference is exactly zero.

Description

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Unit tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Tests
    • Improved test coverage for evaluation comparisons by switching to a precise small-delta scenario. Tests now validate detection and reporting of a 0.001 mean shift while confirming statistical non-significance, increasing confidence in calculation accuracy for subtle differences.

Priscila Gutierres and others added 2 commits February 25, 2026 15:34
  Replaced test_compare_score_distributions_identical_data with test_compare_score_distributions_precise_delta to provide more meaningful validation.
  The new test validates behavior with a precise 0.001 mean difference instead of identical values, using reasonable floating-point tolerance (1e-6) and verifying mean calculations,
  relative change percentage, and that statistical tests correctly report non-significance for small differences with small sample sizes.

  This change improves test coverage by validating real-world scenarios where differences are measurable but not statistically significant,
  rather than testing the trivial case of identical data where difference is exactly zero.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6675a21 and d7f22fe.

📒 Files selected for processing (1)
  • tests/script/test_compare_evaluations.py

Walkthrough

A unit test in evaluation comparison tests was renamed and refactored: from validating identical score distributions to validating a precise delta (scores2 = scores1 + 0.001). Assertions now check computed means, mean_difference, and relative_change against expected values with approximate tolerances and non-significance of statistical tests.

Changes

Cohort / File(s) Summary
Test Method Refactor
tests/script/test_compare_evaluations.py
Renamed test_compare_score_distributions_identical_data to test_compare_score_distributions_precise_delta (applied to two occurrences) and replaced the identical-data scenario with a delta-based scenario. Updated test data to use a 0.001 offset, recomputed expected means/difference, and adjusted assertions to verify mean values, mean_difference, relative_change, and non-significance of statistical tests with approximate matching.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the main change: replacing a test method that validates identical data with one that validates precise delta (0.001 difference).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/script/test_compare_evaluations.py`:
- Around line 210-222: Add assertions that validate the relative change and
non-significance: compute expected_relative = expected_diff / expected_mean1 and
assert result["relative_change"] == pytest.approx(expected_relative), and assert
the test result reports non-significance by checking the boolean flag (e.g.,
result["significant"] is False) or the explicit key your code uses (e.g.,
result["statistically_significant"] is False); place these after the existing
mean and mean_difference assertions to complete the scenario for result (keys:
"relative_change" and the significance flag).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6441d75 and 6675a21.

📒 Files selected for processing (1)
  • tests/script/test_compare_evaluations.py

Comment thread tests/script/test_compare_evaluations.py
Copy link
Copy Markdown
Member

@VladimirKadlec VladimirKadlec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@asamal4 asamal4 merged commit 720e184 into lightspeed-core:main Feb 25, 2026
15 checks passed
rioloc pushed a commit to rioloc/lightspeed-evaluation that referenced this pull request Feb 25, 2026
…speed-core#171)

* Replace identical data test with precise delta validation test

  Replaced test_compare_score_distributions_identical_data with test_compare_score_distributions_precise_delta to provide more meaningful validation.
  The new test validates behavior with a precise 0.001 mean difference instead of identical values, using reasonable floating-point tolerance (1e-6) and verifying mean calculations,
  relative change percentage, and that statistical tests correctly report non-significance for small differences with small sample sizes.

  This change improves test coverage by validating real-world scenarios where differences are measurable but not statistically significant,
  rather than testing the trivial case of identical data where difference is exactly zero.

* fix lint issues

* add rel_change and stat significance asserts

---------

Co-authored-by: Priscila Gutierres <prgutier@redhat.com>
emac-E pushed a commit to emac-E/lightspeed-evaluation that referenced this pull request Apr 10, 2026
…speed-core#171)

* Replace identical data test with precise delta validation test

  Replaced test_compare_score_distributions_identical_data with test_compare_score_distributions_precise_delta to provide more meaningful validation.
  The new test validates behavior with a precise 0.001 mean difference instead of identical values, using reasonable floating-point tolerance (1e-6) and verifying mean calculations,
  relative change percentage, and that statistical tests correctly report non-significance for small differences with small sample sizes.

  This change improves test coverage by validating real-world scenarios where differences are measurable but not statistically significant,
  rather than testing the trivial case of identical data where difference is exactly zero.

* fix lint issues

* add rel_change and stat significance asserts

---------

Co-authored-by: Priscila Gutierres <prgutier@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants