[feat] bump to 1.2.0 with torch 2.11 / torchrec 1.6 / fbgemm 1.6#479
Conversation
- tzrec 1.1.11 -> 1.2.0 - torch 2.10.0 -> 2.11.0 - torchrec 1.5.0 -> 1.6.0 (switch wheel source to tzrec OSS repo.html) - fbgemm-gpu 1.5.0 -> 1.6.0 - torch-tensorrt 2.10.0 -> 2.11.0, now also available for cu126 - dynamicemb 0.0.1+20260407.97b80bf -> 0.1.0+20260420.c7b9ea2 - hstu_attn 0.1.0+bea6b4b -> 0.1.0+c7b9ea2 - Docker tag 1.1 -> 1.2, staged via new tzrec-test repo before promoting to tzrec-devel after CI passes (promote_docker.sh added) - pre-commit: ruff v0.15.4 -> v0.15.11, codespell v2.4.1 -> v2.4.2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- pip.conf: add timeout=120 and retries=5 to tolerate transient mirrors.aliyun.com network blips during Dockerfile pip install steps - build_docker.sh: add set -o pipefail and remove the duplicate shebang so docker build failures are surfaced instead of being swallowed by tee Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wheel downloads from mirrors.aliyun.com/pytorch-wheels occasionally fail mid-stream with ReadTimeoutError even with pip's own retries and timeout bumped. Wrap the torch/torchrec/fbgemm pip install commands in an 8x shell retry loop so transient registry blips don't abort a 40GB image build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- tzrec/utils/plan_util.py: torchrec 1.6's hardware-aware perf estimator
now raises ValueError for any compute kernel not in its internal
kernel_bw_lookup. Dynamicemb registers CUSTOMIZED_KERNEL which is not
in that table, so when dynamicemb is loaded we inject a bandwidth
override for ("cuda", "customized_kernel") on the
EmbeddingPerfEstimator's HardwarePerfConfig, approximated as a
fused_uvm_caching-like mix of HBM and HBM-to-DDR bandwidth.
- tzrec/ops/_triton/triton_hstu_attention.py: the triton 3.6 shipped
with torch 2.11 no longer resolves libdevice.fast_dividef for
(float64, float64). Replace the two silu formulations
(fast_dividef(qk, 1 + exp(-qk))) with the mathematically equivalent
qk * tl.sigmoid(qk), which keeps the dtype flow consistent and also
avoids the libdevice import.
- .github/workflows/{unittest,buildtest,benchmark,unittest_nightly}_ci.yml:
add --ulimit memlock=-1 to the GPU container options. dynamicemb 0.1.0
mlocks physical memory for its HKV cache tables; the default 64 KB
rlimit in the github actions runner containers made its prefetch
path raise "mlock physical memory failed".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI has passed against tzrec-test:1.2-*, the images have been promoted to
tzrec-devel:1.2-{cpu,cu126,cu129}, so switch the 8 workflow YAMLs back
to tzrec-devel. This is the merge-ready state of the branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Revert the shell-level pip retry loops in the Dockerfile — they were added to tolerate a transient mirrors.aliyun.com flap during the 1.2.0 bump build, but aren't needed for the steady state. pip.conf still sets timeout=120 so individual requests don't hang indefinitely; drop the retries=5 pin as well. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After auditing the torchrec 1.6 estimator surface (commits 8fe63b34
+ 005fd685 + 1857d90c) and dynamicemb 0.1.0's own planner overrides,
several pieces of tzrec's dynamicemb integration are dead code or
silently bypass the new APIs.
dynamicemb_util.py:
- Drop the GroupedEmbeddingsLookup / GroupedPooledEmbeddingsLookup
_create_embedding_kernel re-binds. dynamicemb 0.1.0 already provides
these overrides verbatim
(recsys-examples@c7b9ea2:corelib/dynamicemb/dynamicemb/planner/
rw_sharding.py:55-186) with a runtime torchrec-version check; the
tzrec re-binds shadow them and lose dynamicemb's <1.5 fallback.
- Drop the shard_estimators.kernel_bw_lookup monkey-patch. After PR
#3723's legacy-estimator cleanup, shard_estimators no longer imports
kernel_bw_lookup, so the patch is silent dead code. The CUSTOMIZED_KERNEL
bandwidth override now lives on the EmbeddingPerfEstimator config
(see plan_util.py).
- Drop the now-unused imports (BaseEmbedding, GroupedEmbeddingConfig,
ShardingEnv, dist, constants, and the dynamicemb lookup classes).
plan_util.py:
- Replace the static kernel_device_bandwidths dict with a method-level
HardwarePerfConfig.get_device_bw override that respects the per-shard
caching_ratio. The dict approach pinned bandwidth across all 10
cache_load_factor copies enumerate emits per dynamicemb table; the
method override restores 1.5-equivalent fidelity by computing
caching_ratio * hbm + (1-caching_ratio) * hbm_to_ddr / 10 per shard.
Adds a hasattr assert on the private _estimator/_config attribute
chain so a future torchrec rename surfaces a clear error instead of
silently regressing.
- Wire forward-compat for torchrec 1.6 stable / nightly:
* populate self._sharder_data_map = build_sharder_data_map(...) in
enumerate (no-op in v1.6.0-rc1; required after post-rc1 commits
b0027133 / 25b9b5ff first tagged in v2026.03.30.00).
* compute num_buckets via a new _get_num_buckets helper and pass it
to calculate_shard_sizes_and_offsets, mirroring upstream's
virtual-table sharding plumbing (enumerators.py:206, 228).
* thread sharder_key through _filter_sharding_types and apply the new
GUARDED_SHARDING_TYPES_FOR_FP_MODULES filter for
FeatureProcessedEmbeddingBagCollection, matching upstream
enumerators.py:344-365.
- In EmbeddingStorageEstimator, switch sharder_key lookup to
ShardingOption.module_type_key (precomputed in 1.6 at types.py:1175;
post-rc1 PR #3917 removes the legacy sharder_name(type(...)) shape).
Verified: all 5 dynamicemb / HSTU / RTP integration tests pass locally
in tzrec-test:1.2-cu129
(create_dynamicemb_init_ckpt + multi_tower_din_with_dynamicemb_train_eval
+ rank_dlrm_hstu_train_eval_export {AOT,unified_aot} +
multi_tower_din_rtp_train_export).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # tzrec/ops/_triton/triton_hstu_attention.py # tzrec/version.py
Class-level monkey-patch on HardwarePerfConfig.get_device_bw, applied once at dynamicemb_util module load alongside the other dynamicemb patches. Drops the per-planner _estimator/_config private-attribute walk and the hasattr assert in plan_util.py. Also tightens the comments added in this PR to one line each. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI consoles don't upload the per-rank stderr files that test_train_eval / test_export / test_predict write, so a torchrun subprocess failure percolates up to ``self.assertTrue(self.success)`` as an opaque ``False is not true`` with no actual error to diagnose. Print a tail of the failing log file (last 80 lines) right where run_cmd returns False, so future CI failures include the underlying exception in the workflow log. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| self._sharder_map = { | ||
| sharder_name(sharder.module_type): sharder for sharder in sharders | ||
| } | ||
| self._sharder_data_map = build_sharder_data_map(self._sharder_map) |
There was a problem hiding this comment.
self._sharder_data_map is assigned here but never read anywhere in the repo (verified by grep). If torchrec doesn't reflect on this attribute, this and the import on line 66 can be dropped; if it is required, please add a comment so it isn't deleted later.
| _constraints, key = self._get_constraints(child_path, name) | ||
| # GRID_SHARD is only supported if specified by user in parameter constraints | ||
| # GRID_SHARD and row-wise on FP modules require explicit opt-in. | ||
| is_fp_module = "FeatureProcessedEmbeddingBagCollection" in sharder_key |
There was a problem hiding this comment.
Substring matching on the sharder key is brittle: any user-defined wrapper named MyFeatureProcessedEmbeddingBagCollectionXxx matches, while a renamed FP subclass (e.g. FPEBC) silently misses the guard. Since child_module is in scope, prefer isinstance(child_module, FeatureProcessedEmbeddingBagCollection) (importable from torchrec.modules.fp_embedding_modules) and pass that bool through.
| def _customized_kernel_aware_get_device_bw( | ||
| self, # pyre-ignore [2] | ||
| compute_device: str, | ||
| compute_kernel: str, | ||
| hbm_mem_bw: float, | ||
| ddr_mem_bw: float, | ||
| ssd_mem_bw: float, | ||
| hbm_to_ddr_mem_bw: float, | ||
| caching_ratio: Optional[float] = None, | ||
| prefetch_pipeline: bool = False, | ||
| ) -> Optional[float]: | ||
| """Calculates the device bandwidth. | ||
|
|
||
| Args: | ||
| compute_kernel (str): compute kernel. | ||
| compute_device (str): compute device. | ||
| hbm_mem_bw (float): the bandwidth of the device HBM. | ||
| ddr_mem_bw (float): the bandwidth of the system DDR memory. | ||
| hbm_to_ddr_mem_bw (float): the bandwidth between device HBM and system DDR. | ||
| caching_ratio (Optional[float]): caching ratio used to determine device | ||
| bandwidth if UVM caching is enabled. | ||
| prefetch_pipeline (bool): whether prefetch pipeline is enabled. | ||
|
|
||
| Returns: | ||
| Optional[float]: the device bandwidth. | ||
| """ | ||
| if compute_kernel == EmbeddingComputeKernel.CUSTOMIZED_KERNEL.value: | ||
| # for dynamic embedding table | ||
| caching_ratio = caching_ratio if caching_ratio else 0.0 | ||
| return ( | ||
| caching_ratio * hbm_mem_bw + (1 - caching_ratio) * hbm_to_ddr_mem_bw | ||
| ) / 10 | ||
| else: | ||
| return constants.kernel_bw_lookup( | ||
| compute_device=compute_device, | ||
| compute_kernel=compute_kernel, | ||
| hbm_mem_bw=hbm_mem_bw, | ||
| ddr_mem_bw=ddr_mem_bw, | ||
| hbm_to_ddr_mem_bw=hbm_to_ddr_mem_bw, | ||
| caching_ratio=caching_ratio, | ||
| prefetch_pipeline=prefetch_pipeline, | ||
| ) | ||
| cr = caching_ratio if caching_ratio is not None else 0.0 | ||
| return (cr * hbm_mem_bw + (1 - cr) * hbm_to_ddr_mem_bw) / 10 | ||
| return _orig_hw_perf_config_get_device_bw( | ||
| self, | ||
| compute_device, | ||
| compute_kernel, | ||
| hbm_mem_bw, | ||
| ddr_mem_bw, | ||
| ssd_mem_bw, | ||
| hbm_to_ddr_mem_bw, | ||
| caching_ratio, | ||
| prefetch_pipeline, | ||
| ) | ||
|
|
||
| # pyre-ignore [9] | ||
| shard_estimators.kernel_bw_lookup = _kernel_bw_lookup | ||
| HardwarePerfConfig.get_device_bw = _customized_kernel_aware_get_device_bw |
There was a problem hiding this comment.
Two concerns on this monkey-patch:
- Signature drift risk. Hard-coded positional forwarding (incl. the new
ssd_mem_bw) means any future torchrec change — e.g. another memory tier or a renamed kwarg — will fail at planning time with an opaqueTypeError. Considerdef _customized_kernel_aware_get_device_bw(self, *args, **kwargs)with a kwarg/positional lookup forcompute_kernel/hbm_mem_bw/hbm_to_ddr_mem_bw/caching_ratio, thenreturn _orig_hw_perf_config_get_device_bw(self, *args, **kwargs)for the non-customized path. - Lost docstring. The replaced
_kernel_bw_lookuphad a full Google-style docstring; the replacement has none, despite the newssd_mem_bwparameter. Project convention asks for docstrings on non-test functions — please restore one explaining the customized-kernel formula(cr * hbm + (1 - cr) * hbm_to_ddr) / 10and the/10factor in particular.
| DOCKER_TAG=1.2 | ||
| DOCKER_TAG_SUFFIX= | ||
|
|
||
| for DEVICE in cpu cu126 cu129 | ||
| do |
There was a problem hiding this comment.
The promote step pulls tzrec-test:<tag>-<device> by mutable tag, so any push to the test repo between CI passing and promote running will be promoted. To make "what was tested" and "what was promoted" the same artifact, capture the digest at the end of CI (docker inspect --format '{{index .RepoDigests 0}}') and pull that @sha256:... here. The current path is acceptable for a single human-driven promotion but invites a TOCTOU footgun if this is ever automated.
| KVCounter, | ||
| align_to_table_size, | ||
| ) | ||
| from dynamicemb.batched_dynamicemb_compute_kernel import ( | ||
| BatchedDynamicEmbedding, | ||
| BatchedDynamicEmbeddingBag, | ||
| ) | ||
| from dynamicemb.dynamicemb_config import DynamicEmbKernel | ||
| from dynamicemb.planner import ( | ||
| DynamicEmbParameterConstraints, | ||
| DynamicEmbParameterSharding, | ||
| ) |
There was a problem hiding this comment.
Removing the GroupedEmbeddingsLookup/GroupedPooledEmbeddingsLookup._create_embedding_kernel monkey-patches drops two pieces of behavior that the previous version explicitly set:
BatchedDynamicEmbedding/BatchedDynamicEmbeddingBaginstantiation forEmbeddingComputeKernel.CUSTOMIZED_KERNEL.self._need_prefetch = Trueon the lookup — grep shows no other site sets this in tzrec.
If dynamicemb 0.1.0 or torchrec 1.6 now provides this natively, please mention it in the PR description so a future reader doesn't bisect to here. Otherwise this is a silent regression for dynamic embedding tables.
| if not _constraints or not _constraints.get(key): | ||
| return [ | ||
| filtered = [ | ||
| t for t in allowed_sharding_types if t != ShardingType.GRID_SHARD.value | ||
| ] | ||
| if is_fp_module: | ||
| filtered = [ | ||
| t | ||
| for t in filtered | ||
| if t not in GUARDED_SHARDING_TYPES_FOR_FP_MODULES | ||
| ] | ||
| return filtered | ||
| constraints: ParameterConstraints = _constraints[key] | ||
| if not constraints.sharding_types: | ||
| return [ | ||
| filtered = [ | ||
| t for t in allowed_sharding_types if t != ShardingType.GRID_SHARD.value | ||
| ] | ||
| if is_fp_module: | ||
| filtered = [ | ||
| t | ||
| for t in filtered | ||
| if t not in GUARDED_SHARDING_TYPES_FOR_FP_MODULES | ||
| ] | ||
| return filtered |
There was a problem hiding this comment.
Two issues:
- The "drop GRID_SHARD; if FP, drop GUARDED_SHARDING_TYPES_FOR_FP_MODULES" block is repeated verbatim in both branches — easy to drift on the next torchrec bump. A small
_drop_guarded(types, is_fp)helper would deduplicate. - The third branch (when the user supplies an explicit
constraints.sharding_types, just below this block) does not apply the FP guard. If a user constrains an FP module to a guarded type likeROW_WISE, it passes through. This may be intentional ("explicit user opt-in overrides the safety filter") — please add an inline comment stating so, otherwise it reads like an oversight.
Review summaryThe version bump itself is mostly mechanical and clean (Dockerfile, requirements, image tags, doc strings). The torchrec-1.6 adapter changes in A few non-inline observations:
Nothing blocking; main asks are (a) confirm in the PR description that removing the |
oss-accelerate.aliyuncs.com is the global-accelerated CDN endpoint and gives faster, more reliable downloads (especially for the large fbgemm_gpu / torchrec / libidn11 / Miniforge / cuda-keyring artifacts) than the regional oss-cn-beijing.aliyuncs.com endpoint we were using. The bucket and key paths are identical — only the hostname changes — so existing wheel and asset URLs keep working. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- requirements/runtime.txt: bump pyfg pin (cp310/cp311/cp312 wheels) - docs/feature.md: add ExprFeature isnan (new in 1.0.5), mod, corr; drop duplicate sigmoid - docs/feature.md: extend CombineFeature/LookupFeature combiner enum with count/avg/gap_min/gap_max - docs/feature.md: note MatchFeature MAP<K, string> input support - tzrec/features/tokenize_feature.py: omit output_delim in grouped-sequence path; pyfg 1.0.5 expects the inner tokenize feature to emit per-token outputs and rejects output_delim there
Standalone TokenizeFeature parses fine without output_delim too, so the grouped-sequence branch is unnecessary. Simplifies the previous commit.
Follow-up to dropping output_delim from TokenizeFeature._fg_json — update
the expected dicts in feature_test.test_create_fg_json{,_remove_bucketizer}
so they match the new output. Caught by CI on PR alibaba#489.
After cherry-picking the pyfg 1.0.4 -> 1.0.5 bump and the matching TokenizeFeature output_delim drop onto bump/tzrec-1.2.0, point CI back at the staging tzrec-test:1.2 images so the next workflow run validates the freshly-rebuilt containers against the source tree before we promote them to tzrec-devel:1.2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI on tzrec-test:1.2 has passed for the pyfg 1.0.5 + TokenizeFeature
changes; promote_docker.sh has retagged + pushed
tzrec-devel:1.2-{cpu,cu126,cu129} (plus the 1.2 and latest aliases) to
the same digests. Switch the 8 workflow YAMLs back to tzrec-devel:1.2 so
the merged master points at the promoted repo.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # tzrec/version.py
DOCKER_TAG_SUFFIX is a per-build marker on the staging tzrec-test images (e.g. an "-rc1" tail used during a release candidate cycle). When promoting to tzrec-devel we want the suffix stripped so consumers see clean tags like tzrec-devel:1.2-cu129 / 1.2 / latest, not tzrec-devel:1.2-cu129-rc1. Apply the suffix only to the SRC pull and omit it from every DST tag/push line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
torch_tensorrt==2.11.0 is now available for cu126 too (no longer cu129-only since the 1.2.0 bump), so the cu126 image ships TensorRT just like cu129. Strip the stale parenthetical from the local-tutorial docker section. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cu129 PyTorch wheel is no longer compiled with sm_70/sm_60 SASS, so running the cu129 image on Tesla V100 / P100 / P40 (CC 7.0 / 6.x) trips the runtime warning ``Found GPU0 ... CC 7.0`` and any CUDA kernel launch fails. Add a 注意 block under the docker image variant list in local_tutorial.md pointing those users at the cu126 image. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaced the bare "cu129 needs CC ≥ 7.5" note with the exact torch.cuda.get_arch_list() output of each image: - cu129: sm_75 / 80 / 86 / 90 / 100 / 120 + compute_120 PTX (T4, A10/A30/A100, L4/L20, H100/H200, B100/B200; **no V100/P100**) - cu126: sm_50 / 60 / 70 / 75 / 80 / 86 / 90 (Pascal/Volta/Turing/Ampere/Hopper; **no Blackwell**) so users can pick the right image up-front instead of hitting "Found GPU0 ... CC 7.0" at runtime. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fbgemm-gpu (the sparse-embedding kernel library tzrec relies on) no longer ships sm_50/sm_60 SASS, so Pascal (P100/P40) and Maxwell cards fail at the embedding kernel even though stock PyTorch advertises them in get_arch_list. Tighten the doc to cu126 = sm_70/75/80/86/90 (Volta through Hopper) and call out the Pascal caveat explicitly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trim the cu126 bullet to the supported CC range; the fbgemm-gpu Pascal caveat was redundant with the sm_70+ list right above it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trim the cu129 bullet to the supported CC list; the unsupported-card caveat was redundant with the cu126 bullet right below it covering Volta (V100) and the Pascal note that came earlier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both image bullets now follow the "Volta (V100)、Turing (T4)、…" arch-first format with explicit example cards in parentheses, instead of mixing raw card lists in cu129 with arch-name format in cu126. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fbgemm-gpu wheel was silently updated at the existing OSS URL, so docker images need a refresh. The CACHE_BUST_PIP arg busts just the torch/fbgemm/torchrec RUN layer (apt + conda + cuda toolkit layers stay cached).
- Pin tensorrt_cu12==10.15.1.29 in step 12 (matches torch_tensorrt 2.11.0's `tensorrt-cu12<10.16.0,>=10.15.1`); requirements step no longer triggers a second tensorrt install. - Strip `tensorrt` (broad) from torch_tensorrt's METADATA so the bare `Requires-Dist: tensorrt<10.16.0,>=10.15.1` line — which pulled in tensorrt_cu13_libs (~3.7 GB) on top — gets neutralized. - Strip `cuda-toolkit` extras from torch's METADATA so step 19 doesn't re-resolve the 10 nvidia-* wheels we uninstalled in step 12. - Drop the 1.65 GB tensorrt_libs/libnvinfer_builder_resource_win_*.so.* (PE/Windows binaries shipped under .so for wheel-format compliance). - pip cache purge in step 19 to free /root/.cache/pip. - Generate pip.conf at build time via ARG PIP_MIRROR (default: mirrors.cloud.aliyuncs.com, override with --build-arg PIP_MIRROR=mirrors.aliyun.com); revert to public mirror at the last RUN so end-user images still resolve. - Trailing slash on pytorch-wheels find-links URLs to avoid the 301 to mirrors.aliyun.com.
# Conflicts: # tzrec/utils/misc_util.py # tzrec/version.py
pyfg wheel was silently updated at the existing OSS URL, so the requirements layer needs to be busted. ARG CACHE_BUST_REQ on the step-19 RUN forces the layer to rebuild and pull the new wheel content; layers above it (apt + conda + cuda-toolkit + torch + fbgemm + torchrec) stay cached.
Summary
Coordinated upgrade of the PyTorch stack and companion accelerators for the 1.2.0 release.
mirrors.aliyun.com/pytorch-wheelstohttps://tzrec.oss-cn-beijing.aliyuncs.com/third_party/torchrec/repo.htmlDocker images
New
1.2images pushed to a staging repotzrec-testfirst:mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/tzrec-test:1.2-{cpu,cu126,cu129}CI in this PR runs against
tzrec-test:1.2. Once all required checks pass, the images are promoted totzrec-devel:1.2via the newscripts/promote_docker.sh, and a final commit flips the CI workflow YAMLs back totzrec-devel:1.2so the merged master points at the promoted repo.Pre-commit
ruff-pre-commitv0.15.4 → v0.15.11codespellv2.4.1 → v2.4.2pre-commit-hooksalready at latest (v6.0.0)Docker hardening
Large torch wheel downloads from the aliyun mirror occasionally time out mid-stream. Wrapped the pip installs in an 8x shell retry loop, added
timeout=120 retries=5todocker/pip.conf, and setpipefailinscripts/build_docker.shso docker-build failures surface instead of being swallowed bytee.Test plan
buildtest_cigreen againsttzrec-test:1.2unittest_cigreen (GPU, cu129)unittest_cpu_cigreencodestyle_cigreen (new pre-commit versions)pytyping_cigreen (torchrec 1.6 / fbgemm 1.6 / torch 2.11 API surfaces)tzrec-test:1.2-*→tzrec-devel:1.2-*after CI passestzrec-devel:1.2and re-run CI green🤖 Generated with Claude Code