test(producer): add hdr-regression and hdr-hlg-regression test suites#365
Open
vanceingalls wants to merge 4 commits intomainfrom
Open
test(producer): add hdr-regression and hdr-hlg-regression test suites#365vanceingalls wants to merge 4 commits intomainfrom
vanceingalls wants to merge 4 commits intomainfrom
Conversation
Collaborator
Author
This was referenced Apr 21, 2026
Open
perf(producer): hdr benchmark harness — --tags filter, peak heap/RSS tracking, bench:hdr script
#382
Open
a54acd2 to
ebe1483
Compare
jrusso1020
approved these changes
Apr 22, 2026
Collaborator
jrusso1020
left a comment
There was a problem hiding this comment.
Foundational correctness for the stack — every PR after this one leans on these goldens to prove its fix is real. Test design is the right shape:
- Eight PQ windows each exercise an orthogonal code path the old
hdr-pqfixture never touched (wrapper-opacity, direct-<video>-opacity, z-order sandwich, multi-HDR, transform+radius, object-fit, shader transition). - Splitting the HLG suite out as
hdr-hlg-regression(2 windows) is the right call — HLG's OETF and LUT live on a separate code path from PQ, and lumping them into one suite would mask HLG regressions under PQ dominance. - Windows C and F are deliberately set up as documented known-fails with wide
maxFrameFailuresbudgets, tightened in follow-up PRs in this stack (#369 for C, #375 for F). That's the right pattern for landing safety nets before fixes — the fix PR tightens the budget, which mechanically proves the fix.
The regression-suite contract in the README (HDR frames reach the encoder in RGB48 BT.2020 10-bit without sRGB→BT.709 round-tripping, opacity respected on timed leaves, transforms don't clobber the HDR layer's pixel buffer) is the right set of invariants to pin.
Two non-blocking observations:
scripts/generate-hdr-photo-pq.pyshould live somewhere discoverable for future contributors. Right now it's tucked undertests/hdr-regression/scripts/which is fine, but a one-lineREADME.mdnote on "to regenerate, runpython scripts/generate-hdr-photo-pq.py" is worth it so the synthesized golden doesn't become a black box.maxFrameFailuresbudget values for the not-known-fail windows would benefit from a brief comment on where the tolerance came from (e.g. "allows up to N frame diffs from codec noise"). Without it, a future contributor tightening budgets can't tell what room is real and what's codec-jitter slack.
CI shows one failure on styles-a — unrelated to the new suites (different test shard), likely a pre-existing flake on master.
Approved.
— Rames Jusso
04e4040 to
b03abc3
Compare
Collaborator
Author
|
Both observations addressed in 75c6afa:
Thanks for the review. |
632e75f to
3734a6c
Compare
3734a6c to
e9cfefd
Compare
vanceingalls
added a commit
that referenced
this pull request
Apr 22, 2026
Address jrusso1020's nit on PR #365 (non-blocking review): both READMEs now explain where the tolerance values come from. - hdr-regression/README.md: add a budget-breakdown table that derives the 30 frames from the deltas in PRs #369 (window C fix → 5) and #375 (window F fix → 0). The table doubles as a contract: if a future change forces the budget back up, exactly one bucket has regressed and the table tells you which one to investigate first. - hdr-hlg-regression/README.md: add a 'Tolerance' section explaining why 0 is the right floor (HLG is a pure pass-through path, HEVC over rgb48le is byte-deterministic on the same fixture, so any drift is a real regression). The regeneration command for generate-hdr-photo-pq.py was already documented at README lines 67-71, so no changes needed there.
c51e2a3 to
315ab25
Compare
vanceingalls
added a commit
that referenced
this pull request
Apr 22, 2026
Address jrusso1020's nit on PR #365 (non-blocking review): both READMEs now explain where the tolerance values come from. - hdr-regression/README.md: add a budget-breakdown table that derives the 30 frames from the deltas in PRs #369 (window C fix → 5) and #375 (window F fix → 0). The table doubles as a contract: if a future change forces the budget back up, exactly one bucket has regressed and the table tells you which one to investigate first. - hdr-hlg-regression/README.md: add a 'Tolerance' section explaining why 0 is the right floor (HLG is a pure pass-through path, HEVC over rgb48le is byte-deterministic on the same fixture, so any drift is a real regression). The regeneration command for generate-hdr-photo-pq.py was already documented at README lines 67-71, so no changes needed there.
315ab25 to
8c73295
Compare
vanceingalls
added a commit
that referenced
this pull request
Apr 22, 2026
Address jrusso1020's nit on PR #365 (non-blocking review): both READMEs now explain where the tolerance values come from. - hdr-regression/README.md: add a budget-breakdown table that derives the 30 frames from the deltas in PRs #369 (window C fix → 5) and #375 (window F fix → 0). The table doubles as a contract: if a future change forces the budget back up, exactly one bucket has regressed and the table tells you which one to investigate first. - hdr-hlg-regression/README.md: add a 'Tolerance' section explaining why 0 is the right floor (HLG is a pure pass-through path, HEVC over rgb48le is byte-deterministic on the same fixture, so any drift is a real regression). The regeneration command for generate-hdr-photo-pq.py was already documented at README lines 67-71, so no changes needed there.
vanceingalls
added a commit
that referenced
this pull request
Apr 22, 2026
Address jrusso1020's nit on PR #365 (non-blocking review): both READMEs now explain where the tolerance values come from. - hdr-regression/README.md: add a budget-breakdown table that derives the 30 frames from the deltas in PRs #369 (window C fix → 5) and #375 (window F fix → 0). The table doubles as a contract: if a future change forces the budget back up, exactly one bucket has regressed and the table tells you which one to investigate first. - hdr-hlg-regression/README.md: add a 'Tolerance' section explaining why 0 is the right floor (HLG is a pure pass-through path, HEVC over rgb48le is byte-deterministic on the same fixture, so any drift is a real regression). The regeneration command for generate-hdr-photo-pq.py was already documented at README lines 67-71, so no changes needed there.
3495f7a to
7334041
Compare
Replace the old hdr-pq + hdr-image-only tests with two consolidated regression suites that exercise the full HDR pipeline. hdr-regression (PQ, BT.2020, ~20s): - 8 windows (A-H) covering clip-only video, image+video composition, wrapper opacity, direct-on-video opacity, scene transitions, transform + border-radius, mid-clip cuts, and shader transitions. - Reuses the existing hdr-clip.mp4 fixture (NOTICE.md preserved). - New hdr-photo-pq.png generated via scripts/generate-hdr-photo-pq.py (writes a cICP chunk for BT.2020/PQ/full). hdr-hlg-regression (HLG, ARIB STD-B67, ~5s): - 2 windows (A-B) covering clip-only HLG playback and HLG + opacity tween. - New hdr-hlg-clip.mp4 fixture (last 5s of a user-recorded HLG iPhone clip). Both compositions follow the documented timed-element pattern: data-start, data-duration, and class="clip" applied directly to each timed leaf element (no wrapper inheritance). CI: regression workflow's hdr shard now runs the new pair sequentially. LFS: new MP4 fixtures and golden outputs are tracked via existing rules. Goldens generated with bun run test:update --sequential. ffprobe verifies HEVC/yuv420p10le/bt2020nc/smpte2084 (PQ) and arib-std-b67 (HLG). Made-with: Cursor
Docker image builds can take 14-19 min on cache miss, leaving insufficient time for HDR and style regression tests within 40 min. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address jrusso1020's nit on PR #365 (non-blocking review): both READMEs now explain where the tolerance values come from. - hdr-regression/README.md: add a budget-breakdown table that derives the 30 frames from the deltas in PRs #369 (window C fix → 5) and #375 (window F fix → 0). The table doubles as a contract: if a future change forces the budget back up, exactly one bucket has regressed and the table tells you which one to investigate first. - hdr-hlg-regression/README.md: add a 'Tolerance' section explaining why 0 is the right floor (HLG is a pure pass-through path, HEVC over rgb48le is byte-deterministic on the same fixture, so any drift is a real regression). The regeneration command for generate-hdr-photo-pq.py was already documented at README lines 67-71, so no changes needed there.
7334041 to
6b49885
Compare
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6b49885 to
35b9f0a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Replace the trivial
hdr-pqandhdr-image-onlytests with two consolidated, time-windowed regression suites that exercise the full HDR pipeline. These goldens are the safety net for every other PR in this stack.Why
The pre-existing HDR tests covered only a single full-bleed video or image with a static text label — none of the features that the HDR pipeline has to handle differently from SDR (opacity animation, z-ordered multi-layer compositing, transforms, border-radius clipping, shader transitions, multiple HDR sources, object-fit modes, mixed HDR+SDR layering, HLG transfer). This PR builds the missing safety net first so every subsequent fix can be proven correct.
What changed
packages/producer/tests/hdr-regression/(PQ, BT.2020, ~20 s, 1080p, 8 windows A–H):<video>opacity tween (documents the Chunk 1 bug)object-fit: containpackages/producer/tests/hdr-hlg-regression/(HLG, ARIB STD-B67, ~5 s, 2 windows A–B) — exercises the separate HLG LUT/OETF code path that previously had zero coverage.scripts/generate-hdr-photo-pq.pysynthesizeshdr-photo-pq.pngwith a cICP chunk for BT.2020/PQ/full.tests/hdr-pq/andtests/hdr-image-only/..github/workflows/regression.ymlHDR shard to run the new pair sequentially.data-start,data-duration,class="clip"directly on each timed leaf — no wrapper inheritance).Test plan
bun run test:update --sequential.ffprobeconfirms HEVC/yuv420p10le/bt2020nc/smpte2084 (PQ) and arib-std-b67 (HLG).maxFrameFailuresbudgets that absorb the documented Chunk 1 / Chunk 4 known-fails — tightened in follow-up PRs in this stack.Stack
Foundational PR for the HDR follow-ups stack (Chunk 0 of
plans/hdr-followups.md). Every subsequent PR builds on this safety net.