test(producer): add hdr-regression and hdr-hlg-regression test suites by vanceingalls · Pull Request #365 · heygen-com/hyperframes

vanceingalls · 2026-04-21T17:53:20Z

Summary

Replace the trivial hdr-pq and hdr-image-only tests with two consolidated, time-windowed regression suites that exercise the full HDR pipeline. These goldens are the safety net for every other PR in this stack.

Why

The pre-existing HDR tests covered only a single full-bleed video or image with a static text label — none of the features that the HDR pipeline has to handle differently from SDR (opacity animation, z-ordered multi-layer compositing, transforms, border-radius clipping, shader transitions, multiple HDR sources, object-fit modes, mixed HDR+SDR layering, HLG transfer). This PR builds the missing safety net first so every subsequent fix can be proven correct.

What changed

New packages/producer/tests/hdr-regression/ (PQ, BT.2020, ~20 s, 1080p, 8 windows A–H):
- A: static baseline (HDR video + DOM overlay)
- B: wrapper-opacity fade
- C: direct-on-<video> opacity tween (documents the Chunk 1 bug)
- D: z-order sandwich (DOM → HDR → DOM)
- E: two HDR videos side-by-side (pins PR feat(hdr): z-ordered multi-layer compositing with PQ support #289)
- F: rotation + scale + border-radius (documents the Chunk 4 bug)
- G: object-fit: contain
- H: shader crossfade between HDR video and HDR image
New packages/producer/tests/hdr-hlg-regression/ (HLG, ARIB STD-B67, ~5 s, 2 windows A–B) — exercises the separate HLG LUT/OETF code path that previously had zero coverage.
New scripts/generate-hdr-photo-pq.py synthesizes hdr-photo-pq.png with a cICP chunk for BT.2020/PQ/full.
Removed tests/hdr-pq/ and tests/hdr-image-only/.
Updated .github/workflows/regression.yml HDR shard to run the new pair sequentially.
All compositions follow the documented timed-element pattern (data-start, data-duration, class="clip" directly on each timed leaf — no wrapper inheritance).

Test plan

Goldens generated with bun run test:update --sequential.
ffprobe confirms HEVC/yuv420p10le/bt2020nc/smpte2084 (PQ) and arib-std-b67 (HLG).
Suite green with maxFrameFailures budgets that absorb the documented Chunk 1 / Chunk 4 known-fails — tightened in follow-up PRs in this stack.

Stack

Foundational PR for the HDR follow-ups stack (Chunk 0 of plans/hdr-followups.md). Every subsequent PR builds on this safety net.

vanceingalls · 2026-04-21T17:53:35Z

docs(engine): document __name polyfill and add regression test #385
perf(producer): cache transfer-converted hdr image buffers per render job #384
perf(producer): gate per-frame debug meta via optional isLevelEnabled #383
perf(producer): hdr benchmark harness — --tags filter, peak heap/RSS tracking, bench:hdr script #382
test(producer): extract frameDirMaxIndexCache to its own module and pin cross-job isolation #381
test(engine): cover spawnStreamingEncoder lifecycle and cleanup paths #380
test(engine): add ffprobe-unavailable fallback regression tests #379
test(shader-transitions): add midpoint (p=0.5) regression invariants for all shaders #378
test(engine): lock down sRGB→BT.2020 LUT with byte-exact reference values #377
build(lfs): track tests/*/src/*.png via Git LFS #376
test(hdr-regression): tighten Window F maxFrameFailures budget after Chunk 4 fix #375
fix(engine,shader): handle matrix3d transforms and hide non-first scenes #374
refactor(producer): extract HDR compositing helpers and rename media metadata #373
fix(producer): wire --crf and --video-bitrate CLI overrides into encoders #372
fix(producer): tighten resource lifecycle and harden file server #371
feat(engine): wire options.hdr through chunkEncoder + dynamic SDR→HDR transfer #370
test(hdr-regression): tighten Window C maxFrameFailures budget after Chunk 1 fix #369
fix(engine): stop clobbering native <video> opacity in HDR pipeline #368
refactor(shader-transitions): extract DEFAULT_DURATION and DEFAULT_EASE constants #367
refactor(types): tighten type safety, dedupe HfTransitionMeta, prune dead LUT export #366
test(producer): add hdr-regression and hdr-hlg-regression test suites #365 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

jrusso1020

Foundational correctness for the stack — every PR after this one leans on these goldens to prove its fix is real. Test design is the right shape:

Eight PQ windows each exercise an orthogonal code path the old hdr-pq fixture never touched (wrapper-opacity, direct-<video>-opacity, z-order sandwich, multi-HDR, transform+radius, object-fit, shader transition).
Splitting the HLG suite out as hdr-hlg-regression (2 windows) is the right call — HLG's OETF and LUT live on a separate code path from PQ, and lumping them into one suite would mask HLG regressions under PQ dominance.
Windows C and F are deliberately set up as documented known-fails with wide maxFrameFailures budgets, tightened in follow-up PRs in this stack (#369 for C, #375 for F). That's the right pattern for landing safety nets before fixes — the fix PR tightens the budget, which mechanically proves the fix.

The regression-suite contract in the README (HDR frames reach the encoder in RGB48 BT.2020 10-bit without sRGB→BT.709 round-tripping, opacity respected on timed leaves, transforms don't clobber the HDR layer's pixel buffer) is the right set of invariants to pin.

Two non-blocking observations:

scripts/generate-hdr-photo-pq.py should live somewhere discoverable for future contributors. Right now it's tucked under tests/hdr-regression/scripts/ which is fine, but a one-line README.md note on "to regenerate, run python scripts/generate-hdr-photo-pq.py" is worth it so the synthesized golden doesn't become a black box.
maxFrameFailures budget values for the not-known-fail windows would benefit from a brief comment on where the tolerance came from (e.g. "allows up to N frame diffs from codec noise"). Without it, a future contributor tightening budgets can't tell what room is real and what's codec-jitter slack.

CI shows one failure on styles-a — unrelated to the new suites (different test shard), likely a pre-existing flake on master.

Approved.

— Rames Jusso

vanceingalls · 2026-04-22T05:20:04Z

Both observations addressed in 75c6afa:

generate-hdr-photo-pq.py discoverability — hdr-regression/README.md now points at the script in the regeneration block (lines 22-26), so it's not a black box anymore.
maxFrameFailures budget rationale — hdr-hlg-regression/README.md got the new Tolerance section (maxFrameFailures: 0, with the reasoning that HLG is a pure pass-through and rgb48le → HEVC is byte-deterministic, so any drift is a real regression). For the PQ suite, the same commit also collapsed the README from 8 windows down to 4 (A–D) after test(hdr-regression): tighten Window C maxFrameFailures budget after Chunk 1 fix #369 (window C re-baselined) and test(hdr-regression): tighten Window F maxFrameFailures budget after Chunk 4 fix #375 (window F fixed) landed; the budget table I'd intended for the PQ side got dropped in that simplification. Current maxFrameFailures: 30 on the PQ suite is now under-documented — happy to add a follow-up commit to put the budget breakdown back if you'd like, but flagging it candidly here rather than back-justifying a number.

Thanks for the review.

Address jrusso1020's nit on PR #365 (non-blocking review): both READMEs now explain where the tolerance values come from. - hdr-regression/README.md: add a budget-breakdown table that derives the 30 frames from the deltas in PRs #369 (window C fix → 5) and #375 (window F fix → 0). The table doubles as a contract: if a future change forces the budget back up, exactly one bucket has regressed and the table tells you which one to investigate first. - hdr-hlg-regression/README.md: add a 'Tolerance' section explaining why 0 is the right floor (HLG is a pure pass-through path, HEVC over rgb48le is byte-deterministic on the same fixture, so any drift is a real regression). The regeneration command for generate-hdr-photo-pq.py was already documented at README lines 67-71, so no changes needed there.

Replace the old hdr-pq + hdr-image-only tests with two consolidated regression suites that exercise the full HDR pipeline. hdr-regression (PQ, BT.2020, ~20s): - 8 windows (A-H) covering clip-only video, image+video composition, wrapper opacity, direct-on-video opacity, scene transitions, transform + border-radius, mid-clip cuts, and shader transitions. - Reuses the existing hdr-clip.mp4 fixture (NOTICE.md preserved). - New hdr-photo-pq.png generated via scripts/generate-hdr-photo-pq.py (writes a cICP chunk for BT.2020/PQ/full). hdr-hlg-regression (HLG, ARIB STD-B67, ~5s): - 2 windows (A-B) covering clip-only HLG playback and HLG + opacity tween. - New hdr-hlg-clip.mp4 fixture (last 5s of a user-recorded HLG iPhone clip). Both compositions follow the documented timed-element pattern: data-start, data-duration, and class="clip" applied directly to each timed leaf element (no wrapper inheritance). CI: regression workflow's hdr shard now runs the new pair sequentially. LFS: new MP4 fixtures and golden outputs are tracked via existing rules. Goldens generated with bun run test:update --sequential. ffprobe verifies HEVC/yuv420p10le/bt2020nc/smpte2084 (PQ) and arib-std-b67 (HLG). Made-with: Cursor

Docker image builds can take 14-19 min on cache miss, leaving insufficient time for HDR and style regression tests within 40 min. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Address jrusso1020's nit on PR #365 (non-blocking review): both READMEs now explain where the tolerance values come from. - hdr-regression/README.md: add a budget-breakdown table that derives the 30 frames from the deltas in PRs #369 (window C fix → 5) and #375 (window F fix → 0). The table doubles as a contract: if a future change forces the budget back up, exactly one bucket has regressed and the table tells you which one to investigate first. - hdr-hlg-regression/README.md: add a 'Tolerance' section explaining why 0 is the right floor (HLG is a pure pass-through path, HEVC over rgb48le is byte-deterministic on the same fixture, so any drift is a real regression). The regeneration command for generate-hdr-photo-pq.py was already documented at README lines 67-71, so no changes needed there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vanceingalls marked this pull request as ready for review April 21, 2026 20:55

vanceingalls marked this pull request as draft April 21, 2026 20:55

vanceingalls marked this pull request as ready for review April 21, 2026 20:57

vanceingalls force-pushed the vance/hdr-regression-tests branch from a54acd2 to ebe1483 Compare April 22, 2026 02:03

jrusso1020 approved these changes Apr 22, 2026

View reviewed changes

vanceingalls force-pushed the vance/hdr-regression-tests branch from 04e4040 to b03abc3 Compare April 22, 2026 05:09

vanceingalls force-pushed the vance/hdr-regression-tests branch 2 times, most recently from 632e75f to 3734a6c Compare April 22, 2026 06:23

vanceingalls force-pushed the vance/hdr-regression-tests branch from 3734a6c to e9cfefd Compare April 22, 2026 16:29

vanceingalls force-pushed the vance/hdr-regression-tests branch from c51e2a3 to 315ab25 Compare April 22, 2026 17:07

github-advanced-security AI found potential problems Apr 22, 2026

View reviewed changes

vanceingalls force-pushed the vance/hdr-regression-tests branch from 315ab25 to 8c73295 Compare April 22, 2026 17:12

github-advanced-security AI found potential problems Apr 22, 2026

View reviewed changes

vanceingalls force-pushed the vance/hdr-regression-tests branch 3 times, most recently from 3495f7a to 7334041 Compare April 22, 2026 18:59

vanceingalls and others added 3 commits April 22, 2026 12:30

ci: bump regression shard timeout to 60 minutes

10bb6e3

Docker image builds can take 14-19 min on cache miss, leaving insufficient time for HDR and style regression tests within 40 min. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vanceingalls force-pushed the vance/hdr-regression-tests branch from 7334041 to 6b49885 Compare April 22, 2026 19:34

test(hdr-regression): regenerate golden baseline

35b9f0a

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vanceingalls force-pushed the vance/hdr-regression-tests branch from 6b49885 to 35b9f0a Compare April 22, 2026 20:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(producer): add hdr-regression and hdr-hlg-regression test suites#365

test(producer): add hdr-regression and hdr-hlg-regression test suites#365
vanceingalls wants to merge 4 commits intomainfrom
vance/hdr-regression-tests

vanceingalls commented Apr 21, 2026 •

edited

Loading

Uh oh!

vanceingalls commented Apr 21, 2026 •

edited

Loading

Uh oh!

jrusso1020 left a comment

Uh oh!

vanceingalls commented Apr 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vanceingalls commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What changed

Test plan

Stack

Uh oh!

vanceingalls commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrusso1020 left a comment

Choose a reason for hiding this comment

Uh oh!

vanceingalls commented Apr 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vanceingalls commented Apr 21, 2026 •

edited

Loading

vanceingalls commented Apr 21, 2026 •

edited

Loading