Skip to content

test(producer): add hdr-regression and hdr-hlg-regression test suites#365

Open
vanceingalls wants to merge 4 commits intomainfrom
vance/hdr-regression-tests
Open

test(producer): add hdr-regression and hdr-hlg-regression test suites#365
vanceingalls wants to merge 4 commits intomainfrom
vance/hdr-regression-tests

Conversation

@vanceingalls
Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls commented Apr 21, 2026

Summary

Replace the trivial hdr-pq and hdr-image-only tests with two consolidated, time-windowed regression suites that exercise the full HDR pipeline. These goldens are the safety net for every other PR in this stack.

Why

The pre-existing HDR tests covered only a single full-bleed video or image with a static text label — none of the features that the HDR pipeline has to handle differently from SDR (opacity animation, z-ordered multi-layer compositing, transforms, border-radius clipping, shader transitions, multiple HDR sources, object-fit modes, mixed HDR+SDR layering, HLG transfer). This PR builds the missing safety net first so every subsequent fix can be proven correct.

What changed

  • New packages/producer/tests/hdr-regression/ (PQ, BT.2020, ~20 s, 1080p, 8 windows A–H):
    • A: static baseline (HDR video + DOM overlay)
    • B: wrapper-opacity fade
    • C: direct-on-<video> opacity tween (documents the Chunk 1 bug)
    • D: z-order sandwich (DOM → HDR → DOM)
    • E: two HDR videos side-by-side (pins PR feat(hdr): z-ordered multi-layer compositing with PQ support #289)
    • F: rotation + scale + border-radius (documents the Chunk 4 bug)
    • G: object-fit: contain
    • H: shader crossfade between HDR video and HDR image
  • New packages/producer/tests/hdr-hlg-regression/ (HLG, ARIB STD-B67, ~5 s, 2 windows A–B) — exercises the separate HLG LUT/OETF code path that previously had zero coverage.
  • New scripts/generate-hdr-photo-pq.py synthesizes hdr-photo-pq.png with a cICP chunk for BT.2020/PQ/full.
  • Removed tests/hdr-pq/ and tests/hdr-image-only/.
  • Updated .github/workflows/regression.yml HDR shard to run the new pair sequentially.
  • All compositions follow the documented timed-element pattern (data-start, data-duration, class="clip" directly on each timed leaf — no wrapper inheritance).

Test plan

  • Goldens generated with bun run test:update --sequential.
  • ffprobe confirms HEVC/yuv420p10le/bt2020nc/smpte2084 (PQ) and arib-std-b67 (HLG).
  • Suite green with maxFrameFailures budgets that absorb the documented Chunk 1 / Chunk 4 known-fails — tightened in follow-up PRs in this stack.

Stack

Foundational PR for the HDR follow-ups stack (Chunk 0 of plans/hdr-followups.md). Every subsequent PR builds on this safety net.

Copy link
Copy Markdown
Collaborator Author

vanceingalls commented Apr 21, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

This was referenced Apr 21, 2026
@vanceingalls vanceingalls marked this pull request as ready for review April 21, 2026 20:55
@vanceingalls vanceingalls marked this pull request as draft April 21, 2026 20:55
@vanceingalls vanceingalls marked this pull request as ready for review April 21, 2026 20:57
@vanceingalls vanceingalls force-pushed the vance/hdr-regression-tests branch from a54acd2 to ebe1483 Compare April 22, 2026 02:03
Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Foundational correctness for the stack — every PR after this one leans on these goldens to prove its fix is real. Test design is the right shape:

  • Eight PQ windows each exercise an orthogonal code path the old hdr-pq fixture never touched (wrapper-opacity, direct-<video>-opacity, z-order sandwich, multi-HDR, transform+radius, object-fit, shader transition).
  • Splitting the HLG suite out as hdr-hlg-regression (2 windows) is the right call — HLG's OETF and LUT live on a separate code path from PQ, and lumping them into one suite would mask HLG regressions under PQ dominance.
  • Windows C and F are deliberately set up as documented known-fails with wide maxFrameFailures budgets, tightened in follow-up PRs in this stack (#369 for C, #375 for F). That's the right pattern for landing safety nets before fixes — the fix PR tightens the budget, which mechanically proves the fix.

The regression-suite contract in the README (HDR frames reach the encoder in RGB48 BT.2020 10-bit without sRGB→BT.709 round-tripping, opacity respected on timed leaves, transforms don't clobber the HDR layer's pixel buffer) is the right set of invariants to pin.

Two non-blocking observations:

  1. scripts/generate-hdr-photo-pq.py should live somewhere discoverable for future contributors. Right now it's tucked under tests/hdr-regression/scripts/ which is fine, but a one-line README.md note on "to regenerate, run python scripts/generate-hdr-photo-pq.py" is worth it so the synthesized golden doesn't become a black box.
  2. maxFrameFailures budget values for the not-known-fail windows would benefit from a brief comment on where the tolerance came from (e.g. "allows up to N frame diffs from codec noise"). Without it, a future contributor tightening budgets can't tell what room is real and what's codec-jitter slack.

CI shows one failure on styles-a — unrelated to the new suites (different test shard), likely a pre-existing flake on master.

Approved.

Rames Jusso

@vanceingalls vanceingalls force-pushed the vance/hdr-regression-tests branch from 04e4040 to b03abc3 Compare April 22, 2026 05:09
@vanceingalls
Copy link
Copy Markdown
Collaborator Author

Both observations addressed in 75c6afa:

  1. generate-hdr-photo-pq.py discoverabilityhdr-regression/README.md now points at the script in the regeneration block (lines 22-26), so it's not a black box anymore.
  2. maxFrameFailures budget rationalehdr-hlg-regression/README.md got the new Tolerance section (maxFrameFailures: 0, with the reasoning that HLG is a pure pass-through and rgb48le → HEVC is byte-deterministic, so any drift is a real regression). For the PQ suite, the same commit also collapsed the README from 8 windows down to 4 (A–D) after test(hdr-regression): tighten Window C maxFrameFailures budget after Chunk 1 fix #369 (window C re-baselined) and test(hdr-regression): tighten Window F maxFrameFailures budget after Chunk 4 fix #375 (window F fixed) landed; the budget table I'd intended for the PQ side got dropped in that simplification. Current maxFrameFailures: 30 on the PQ suite is now under-documented — happy to add a follow-up commit to put the budget breakdown back if you'd like, but flagging it candidly here rather than back-justifying a number.

Thanks for the review.

@vanceingalls vanceingalls force-pushed the vance/hdr-regression-tests branch 2 times, most recently from 632e75f to 3734a6c Compare April 22, 2026 06:23
@vanceingalls vanceingalls force-pushed the vance/hdr-regression-tests branch from 3734a6c to e9cfefd Compare April 22, 2026 16:29
vanceingalls added a commit that referenced this pull request Apr 22, 2026
Address jrusso1020's nit on PR #365 (non-blocking review): both READMEs now
explain where the tolerance values come from.

- hdr-regression/README.md: add a budget-breakdown table that derives the 30
  frames from the deltas in PRs #369 (window C fix → 5) and #375 (window F
  fix → 0). The table doubles as a contract: if a future change forces the
  budget back up, exactly one bucket has regressed and the table tells you
  which one to investigate first.
- hdr-hlg-regression/README.md: add a 'Tolerance' section explaining why 0
  is the right floor (HLG is a pure pass-through path, HEVC over rgb48le is
  byte-deterministic on the same fixture, so any drift is a real regression).

The regeneration command for generate-hdr-photo-pq.py was already documented
at README lines 67-71, so no changes needed there.
@vanceingalls vanceingalls force-pushed the vance/hdr-regression-tests branch from c51e2a3 to 315ab25 Compare April 22, 2026 17:07
vanceingalls added a commit that referenced this pull request Apr 22, 2026
Address jrusso1020's nit on PR #365 (non-blocking review): both READMEs now
explain where the tolerance values come from.

- hdr-regression/README.md: add a budget-breakdown table that derives the 30
  frames from the deltas in PRs #369 (window C fix → 5) and #375 (window F
  fix → 0). The table doubles as a contract: if a future change forces the
  budget back up, exactly one bucket has regressed and the table tells you
  which one to investigate first.
- hdr-hlg-regression/README.md: add a 'Tolerance' section explaining why 0
  is the right floor (HLG is a pure pass-through path, HEVC over rgb48le is
  byte-deterministic on the same fixture, so any drift is a real regression).

The regeneration command for generate-hdr-photo-pq.py was already documented
at README lines 67-71, so no changes needed there.
@vanceingalls vanceingalls force-pushed the vance/hdr-regression-tests branch from 315ab25 to 8c73295 Compare April 22, 2026 17:12
vanceingalls added a commit that referenced this pull request Apr 22, 2026
Address jrusso1020's nit on PR #365 (non-blocking review): both READMEs now
explain where the tolerance values come from.

- hdr-regression/README.md: add a budget-breakdown table that derives the 30
  frames from the deltas in PRs #369 (window C fix → 5) and #375 (window F
  fix → 0). The table doubles as a contract: if a future change forces the
  budget back up, exactly one bucket has regressed and the table tells you
  which one to investigate first.
- hdr-hlg-regression/README.md: add a 'Tolerance' section explaining why 0
  is the right floor (HLG is a pure pass-through path, HEVC over rgb48le is
  byte-deterministic on the same fixture, so any drift is a real regression).

The regeneration command for generate-hdr-photo-pq.py was already documented
at README lines 67-71, so no changes needed there.
vanceingalls added a commit that referenced this pull request Apr 22, 2026
Address jrusso1020's nit on PR #365 (non-blocking review): both READMEs now
explain where the tolerance values come from.

- hdr-regression/README.md: add a budget-breakdown table that derives the 30
  frames from the deltas in PRs #369 (window C fix → 5) and #375 (window F
  fix → 0). The table doubles as a contract: if a future change forces the
  budget back up, exactly one bucket has regressed and the table tells you
  which one to investigate first.
- hdr-hlg-regression/README.md: add a 'Tolerance' section explaining why 0
  is the right floor (HLG is a pure pass-through path, HEVC over rgb48le is
  byte-deterministic on the same fixture, so any drift is a real regression).

The regeneration command for generate-hdr-photo-pq.py was already documented
at README lines 67-71, so no changes needed there.
@vanceingalls vanceingalls force-pushed the vance/hdr-regression-tests branch 3 times, most recently from 3495f7a to 7334041 Compare April 22, 2026 18:59
vanceingalls and others added 3 commits April 22, 2026 12:30
Replace the old hdr-pq + hdr-image-only tests with two consolidated
regression suites that exercise the full HDR pipeline.

hdr-regression (PQ, BT.2020, ~20s):
- 8 windows (A-H) covering clip-only video, image+video composition,
  wrapper opacity, direct-on-video opacity, scene transitions, transform
  + border-radius, mid-clip cuts, and shader transitions.
- Reuses the existing hdr-clip.mp4 fixture (NOTICE.md preserved).
- New hdr-photo-pq.png generated via scripts/generate-hdr-photo-pq.py
  (writes a cICP chunk for BT.2020/PQ/full).

hdr-hlg-regression (HLG, ARIB STD-B67, ~5s):
- 2 windows (A-B) covering clip-only HLG playback and HLG + opacity tween.
- New hdr-hlg-clip.mp4 fixture (last 5s of a user-recorded HLG iPhone clip).

Both compositions follow the documented timed-element pattern: data-start,
data-duration, and class="clip" applied directly to each timed leaf
element (no wrapper inheritance).

CI: regression workflow's hdr shard now runs the new pair sequentially.
LFS: new MP4 fixtures and golden outputs are tracked via existing rules.

Goldens generated with bun run test:update --sequential.
ffprobe verifies HEVC/yuv420p10le/bt2020nc/smpte2084 (PQ) and arib-std-b67 (HLG).

Made-with: Cursor
Docker image builds can take 14-19 min on cache miss, leaving
insufficient time for HDR and style regression tests within 40 min.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address jrusso1020's nit on PR #365 (non-blocking review): both READMEs now
explain where the tolerance values come from.

- hdr-regression/README.md: add a budget-breakdown table that derives the 30
  frames from the deltas in PRs #369 (window C fix → 5) and #375 (window F
  fix → 0). The table doubles as a contract: if a future change forces the
  budget back up, exactly one bucket has regressed and the table tells you
  which one to investigate first.
- hdr-hlg-regression/README.md: add a 'Tolerance' section explaining why 0
  is the right floor (HLG is a pure pass-through path, HEVC over rgb48le is
  byte-deterministic on the same fixture, so any drift is a real regression).

The regeneration command for generate-hdr-photo-pq.py was already documented
at README lines 67-71, so no changes needed there.
@vanceingalls vanceingalls force-pushed the vance/hdr-regression-tests branch from 7334041 to 6b49885 Compare April 22, 2026 19:34
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vanceingalls vanceingalls force-pushed the vance/hdr-regression-tests branch from 6b49885 to 35b9f0a Compare April 22, 2026 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants