Skip to content

Implement Harmony2 paper improvements for preventing overintegration#625

Merged
Intron7 merged 10 commits intomainfrom
harmony2-support
Apr 8, 2026
Merged

Implement Harmony2 paper improvements for preventing overintegration#625
Intron7 merged 10 commits intomainfrom
harmony2-support

Conversation

@Intron7
Copy link
Copy Markdown
Member

@Intron7 Intron7 commented Mar 30, 2026

Implements three algorithmic improvements from the Harmony2 paper (Patikas et al., 2026; doi:10.64898/2026.03.16.711825) to prevent overintegration in biologically heterogeneous
datasets:

  • Stabilized diversity penalty: Changes the penalty formula from (E+1)/(O+1) to (E+1)/(O+E+1), which prevents the diversity term from diverging when a batch is absent from a cluster.
    The old formula could dominate the objective and force overintegrated clusters. Controlled via stabilized_penalty (default True), templated as a compile-time bool in CUDA kernels.
  • Dynamic per-cluster-per-batch ridge regularization: Replaces the fixed scalar ridge_lambda=1.0 with lambda_kb = alpha * E_kb, scaling regularization proportionally to the expected
    batch representation in each cluster. Well-represented batches get strong correction; rare/absent batches get weak correction. Controlled via dynamic_lambda (default True) and alpha
    (default 0.2).
  • Batch pruning: Batches with O_kb / N_b < threshold are pruned from the correction for that cluster (their correction factors are set to zero). Implemented as a special case of
    dynamic lambda — pruned entries get lambda_kb = 1e30, which drives the correction to zero without any kernel changes. Controlled via batch_prune_threshold (default 1e-5).

All three features default to Harmony2 behavior. To recover the original Harmony1 behavior, pass stabilized_penalty=False, dynamic_lambda=False.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 30, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a stabilized penalty mode and per-(batch,cluster) ridge regularization (lambda_kb), threading both through CUDA kernels, C++/nanobind bindings, Python Harmony API and helpers, updates tests, and adds a bibliography entry for Patikas2026.

Changes

Cohort / File(s) Summary
Bibliography & docs
docs/references.bib, src/rapids_singlecell/preprocessing/_harmony_integrate.py
Added @misc{Patikas2026}; updated harmony_integrate docstring to cite Patikas2026 and added Harmony2-related keyword args (stabilized_penalty, dynamic_lambda, alpha, batch_prune_threshold).
Clustering (stabilized flag)
src/rapids_singlecell/_cuda/harmony/clustering/clustering.cu, src/rapids_singlecell/_cuda/harmony/clustering/kernels_clustering.cuh
Added stabilized boolean to clustering API and objective; diversity_kernel became template<typename T, bool Stabilized> and kernel launches/selectors now depend on stabilized; nanobind signatures updated to accept the flag.
Penalty (stabilized flag)
src/rapids_singlecell/_cuda/harmony/pen/pen.cu, src/rapids_singlecell/_cuda/harmony/pen/kernels_pen.cuh
penalty_kernel now template<typename T, bool Stabilized>; launch_penalty and exposed penalty binding accept stabilized and select the corresponding kernel instantiation.
Correction (per-batch,cluster lambda_kb)
src/rapids_singlecell/_cuda/harmony/correction/...
src/rapids_singlecell/_cuda/harmony/correction/correction_batched.cu, src/rapids_singlecell/_cuda/harmony/correction/correction_fast.cu, src/rapids_singlecell/_cuda/harmony/correction/kernels_correction_fast.cuh
Replaced scalar ridge_lambda with device array lambda_kb (const T* / gpu_array_c<const T, Device>) across kernels, impls, and nanobind bindings; kernel signatures and launches updated to index per-(batch,cluster) lambda values.
Python Harmony API & helpers
src/rapids_singlecell/preprocessing/_harmony/__init__.py, src/rapids_singlecell/preprocessing/_harmony/_helper.py
harmonize() signature extended with stabilized_penalty, dynamic_lambda, alpha, batch_prune_threshold; added _compute_lambda_kb() producing per-(batch,cluster) lambda (dynamic or uniform, with pruning/sentinel and zero-guarding); removed _compute_inv_mats_batched; correction call sites now accept/pass lambda_kb.
Tests
tests/test_harmony.py, tests/test_harmony_kernels.py
Expanded and added fixtures and tests for Harmony2: validation of new kwargs, tests covering stabilized modes, per-(batch,cluster) lambda_kb, pruning/sentinel behavior, matching fast/batched/original correction, and kernel edge cases (absent batches / zero denominators).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.76% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The description clearly outlines the three algorithmic improvements from the Harmony2 paper and explains their purpose and controls.
Title check ✅ Passed The title accurately describes the main change: implementing three algorithmic improvements from the Harmony2 paper to prevent overintegration. It directly maps to the PR's core objective.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch harmony2-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/rapids_singlecell/preprocessing/_harmony/_helper.py (1)

554-583: ⚠️ Potential issue | 🟠 Major

Validate lambda_kb before building the inverse tensor.

lambda_kb moved from a scalar to a full matrix, but this helper never checks that it matches O.shape. A (n_clusters,) or (n_batches, 1) input will broadcast through the transpose arithmetic and silently produce the wrong inverse matrices. Normalizing both inputs to dtype here also keeps this path aligned with the requested precision.

♻️ Proposed fix
 def _compute_inv_mats_batched(
     O: cp.ndarray,
     lambda_kb: cp.ndarray,
     dtype: cp.dtype,
 ) -> cp.ndarray:
@@
-    n_batches, n_clusters = O.shape
+    if lambda_kb.shape != O.shape:
+        raise ValueError(
+            "lambda_kb must have shape (n_batches, n_clusters) matching O"
+        )
+    O = O.astype(dtype, copy=False)
+    lambda_kb = lambda_kb.astype(dtype, copy=False)
+
+    n_batches, n_clusters = O.shape
     n_batches_p1 = n_batches + 1
@@
-    factor = 1.0 / (O.T + lambda_kb.T)
+    factor = cp.reciprocal((O + lambda_kb).T)
As per coding guidelines, `**/*.py`: "Check for and fix dtype mismatches (float32 vs float64) between CuPy arrays and Python scalars, especially when passing arrays to CUDA kernels."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rapids_singlecell/preprocessing/_harmony/_helper.py` around lines 554 -
583, In _compute_inv_mats_batched validate and normalize lambda_kb before using
it: ensure lambda_kb is a CuPy array with the exact shape (n_batches,
n_clusters) (raise a clear ValueError if not), cast both O and lambda_kb to the
function dtype (e.g., via cp.asarray(..., dtype=dtype)) to avoid float32/64
mismatches, and only then compute factor = 1.0 / (O.T + lambda_kb.T); this
prevents unintended broadcasting from shapes like (n_clusters,) or (n_batches,1)
and keeps precision consistent with inv_mats.
tests/test_harmony.py (1)

145-154: ⚠️ Potential issue | 🟠 Major

Add a numerical reference check for the new default path.

This keeps the Harmony1 fallback covered, but the new default (stabilized_penalty=True, dynamic_lambda=True) is still only exercised by shape/correlation checks in this file. A wiring regression in the Harmony2 path can now pass while this reference test stays green. Please add a companion assertion against a frozen Harmony2 output or another trusted reference implementation.

As per coding guidelines, **/*test*.{py,cpp}: "HIGH: Missing validation of numerical correctness against CPU reference, missing edge case coverage (single row, empty input, max-size input), or tests that only check 'runs without error'."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_harmony.py` around lines 145 - 154, The test currently exercises
Harmony1 fallback but lacks a numeric reference for the new Harmony2 default
path; add a companion assertion that runs rsc.pp.harmony_integrate on the same
adata_reference with stabilized_penalty=True and dynamic_lambda=True (and fixed
random seed/dtype) and compares a deterministic numeric output (e.g., corrected
X or embedding returned by rsc.pp.harmony_integrate) against a frozen expected
numpy array using a tight tolerance (np.testing.assert_allclose or
pytest.approx); store the frozen reference as a literal array in
tests/test_harmony.py (or load from a committed fixture) and include the new
assertion in the same test or a new test function to ensure wiring/regression
coverage for the Harmony2 path.
tests/test_harmony_kernels.py (2)

82-109: ⚠️ Potential issue | 🟠 Major

Add an explicit absent-batch case for the Harmony2 formulas.

Both stabilized test paths still sample strictly positive O/E, so they never hit the O_kb == 0 condition this PR is trying to stabilize. A regression in the absent-batch branch would still pass here; please pin at least one entry to O=0 (ideally with a large E) and assert the kernel stays finite and matches the reference in both modes.

As per coding guidelines, "HIGH: Missing validation of numerical correctness against CPU reference, missing edge case coverage (single row, empty input, max-size input), or tests that only check 'runs without error'."

Also applies to: 241-295

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_harmony_kernels.py` around lines 82 - 109, The test_penalty case
currently samples strictly positive O and E so it never exercises the
absent-batch branch (O_kb == 0); update the test to pin at least one O entry to
0 (preferably with a large corresponding E) before calling _pen.penalty (and
likewise add the same absent-batch scenario in the other related test block
covering lines 241-295), then compute the CPU reference denom/expected exactly
as before and assert the kernel output is finite and matches expected for both
stabilized=True and False; reference the test function test_penalty and the
kernel call _pen.penalty to locate where to inject the O[k] = 0 case and the
additional assertions.

328-352: ⚠️ Potential issue | 🟠 Major

Make lambda_kb non-uniform in test_compute_inv_mat.

cp.full_like(O, ridge_lambda) only verifies the legacy scalar case. A bad stride, transpose, or accidental scalar read in compute_inv_mats_kernel would still pass because every entry is identical; use distinct per-batch/per-cluster values and include a 1e30 sentinel so the new indexing and pruning path are actually exercised.

Suggested test shape
-    ridge_lambda = 1.0
-    lambda_kb = cp.full_like(O, ridge_lambda)
+    lambda_kb = cp.asarray(
+        rng.random((n_batches, n_clusters)) * 2 + 0.1,
+        dtype=dtype,
+    )
+    lambda_kb[0, 0] = dtype(1e30)

As per coding guidelines, "HIGH: Missing validation of numerical correctness against CPU reference, missing edge case coverage (single row, empty input, max-size input), or tests that only check 'runs without error'."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_harmony_kernels.py` around lines 328 - 352, In
test_compute_inv_mat replace the uniform lambda_kb = cp.full_like(O,
ridge_lambda) with a non-uniform per-batch/per-cluster array so the kernel’s
indexing/pruning is exercised: build lambda_kb with the same shape as O (e.g.
lambda_kb = ridge_lambda + (cp.arange(O.size, dtype=dtype).reshape(O.shape) *
small_scale).astype(dtype)) and then set at least one sentinel value to 1e30
(e.g. lambda_kb[0,0]=dtype(1e30)); keep using that lambda_kb when calling
_corr.compute_inv_mat (and ensure dtype and shape match O, n_batches,
n_clusters, cluster_k, inv_mat, g_factor, g_P_row0, stream).
🧹 Nitpick comments (1)
src/rapids_singlecell/preprocessing/_harmony_integrate.py (1)

33-41: Consider making the Harmony2 controls explicit kwargs.

stabilized_penalty and dynamic_lambda are now documented as public knobs, but they still disappear inside **kwargs, so they will not show up in signatures or generated API docs. Making them explicit—or at least listing alpha and batch_prune_threshold under kwargs—would make the default behavior change much easier to discover.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rapids_singlecell/preprocessing/_harmony_integrate.py` around lines 33 -
41, The doc comment points out that stabilized_penalty and dynamic_lambda are
documented but remain hidden in **kwargs; update the harmony_integrate function
signature to expose these parameters explicitly (add stabilized_penalty: bool =
True, dynamic_lambda: bool = True) and likewise add explicit kwargs for alpha
and batch_prune_threshold with their defaults, remove them from blind **kwargs
handling, pass them through to the underlying Harmony2 call (wherever
harmony_integrate forwards params), and update the function docstring to list
these parameters so they appear in generated API docs; ensure any internal use
of **kwargs in harmony_integrate (or helper functions it calls) is adjusted to
accept the new named parameters and that callers are updated if necessary.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/rapids_singlecell/_cuda/harmony/correction/kernels_correction_fast.cuh`:
- Around line 52-57: The division can produce inf/NaN when o_val + lam == 0;
ensure the per-entry ridge term is strictly positive by flooring the denominator
in the loop that computes f and p (the block using O, lambda_kb, f, p inside the
for over b). Compute a safe denominator = o_val + lam and if it is <= 0 (or
below a tiny epsilon) replace it with a small positive epsilon (use an
appropriate T epsilon, e.g. numeric_limits<T>::epsilon() or a small constant
like 1e-6 as T) before computing f = T(1)/denom and p = -f * o_val so no
division-by-zero or near-zero occurs; alternatively ensure lambda_kb is
validated upstream to be > 0 for all (b,k).

In `@src/rapids_singlecell/preprocessing/_harmony/__init__.py`:
- Around line 58-61: Add upfront validation for the new public knobs so invalid
values fail fast: before the solver is invoked (i.e., before computing lambda_kb
where alpha is used and before applying batch pruning using
batch_prune_threshold in the code in __init__.py), assert that alpha is >= 0 and
that batch_prune_threshold is not None and lies in [0, 1] (or raise a ValueError
with a clear message referencing alpha and batch_prune_threshold). Also document
these checks next to the stabilized_penalty and dynamic_lambda parameter
declarations so callers cannot silently pass alpha < 0 or batch_prune_threshold
> 1.
- Around line 440-457: _compute_lambda_kb computes per-(k,b) regularization but
divides by N_b when forming prune_mask, which can be zero for unused categorical
levels; change the logic to guard against zero N_b by treating those batches as
pruned (i.e., set prune_mask True for rows where N_b == 0) or by using a safe
divisor (e.g., N_b_safe = where(N_b==0, 1, N_b)) before computing (O /
N_b_safe[:, None]); then assign a large regularizer (E.dtype.type(1e30)) to
lambda_kb for those pruned/zero-count batches so downstream 1/(O + lambda_kb)
never divides by zero. Ensure the fix is applied only when dynamic_lambda is
true and keep existing behavior for non-zero N_b.

---

Outside diff comments:
In `@src/rapids_singlecell/preprocessing/_harmony/_helper.py`:
- Around line 554-583: In _compute_inv_mats_batched validate and normalize
lambda_kb before using it: ensure lambda_kb is a CuPy array with the exact shape
(n_batches, n_clusters) (raise a clear ValueError if not), cast both O and
lambda_kb to the function dtype (e.g., via cp.asarray(..., dtype=dtype)) to
avoid float32/64 mismatches, and only then compute factor = 1.0 / (O.T +
lambda_kb.T); this prevents unintended broadcasting from shapes like
(n_clusters,) or (n_batches,1) and keeps precision consistent with inv_mats.

In `@tests/test_harmony_kernels.py`:
- Around line 82-109: The test_penalty case currently samples strictly positive
O and E so it never exercises the absent-batch branch (O_kb == 0); update the
test to pin at least one O entry to 0 (preferably with a large corresponding E)
before calling _pen.penalty (and likewise add the same absent-batch scenario in
the other related test block covering lines 241-295), then compute the CPU
reference denom/expected exactly as before and assert the kernel output is
finite and matches expected for both stabilized=True and False; reference the
test function test_penalty and the kernel call _pen.penalty to locate where to
inject the O[k] = 0 case and the additional assertions.
- Around line 328-352: In test_compute_inv_mat replace the uniform lambda_kb =
cp.full_like(O, ridge_lambda) with a non-uniform per-batch/per-cluster array so
the kernel’s indexing/pruning is exercised: build lambda_kb with the same shape
as O (e.g. lambda_kb = ridge_lambda + (cp.arange(O.size,
dtype=dtype).reshape(O.shape) * small_scale).astype(dtype)) and then set at
least one sentinel value to 1e30 (e.g. lambda_kb[0,0]=dtype(1e30)); keep using
that lambda_kb when calling _corr.compute_inv_mat (and ensure dtype and shape
match O, n_batches, n_clusters, cluster_k, inv_mat, g_factor, g_P_row0, stream).

In `@tests/test_harmony.py`:
- Around line 145-154: The test currently exercises Harmony1 fallback but lacks
a numeric reference for the new Harmony2 default path; add a companion assertion
that runs rsc.pp.harmony_integrate on the same adata_reference with
stabilized_penalty=True and dynamic_lambda=True (and fixed random seed/dtype)
and compares a deterministic numeric output (e.g., corrected X or embedding
returned by rsc.pp.harmony_integrate) against a frozen expected numpy array
using a tight tolerance (np.testing.assert_allclose or pytest.approx); store the
frozen reference as a literal array in tests/test_harmony.py (or load from a
committed fixture) and include the new assertion in the same test or a new test
function to ensure wiring/regression coverage for the Harmony2 path.

---

Nitpick comments:
In `@src/rapids_singlecell/preprocessing/_harmony_integrate.py`:
- Around line 33-41: The doc comment points out that stabilized_penalty and
dynamic_lambda are documented but remain hidden in **kwargs; update the
harmony_integrate function signature to expose these parameters explicitly (add
stabilized_penalty: bool = True, dynamic_lambda: bool = True) and likewise add
explicit kwargs for alpha and batch_prune_threshold with their defaults, remove
them from blind **kwargs handling, pass them through to the underlying Harmony2
call (wherever harmony_integrate forwards params), and update the function
docstring to list these parameters so they appear in generated API docs; ensure
any internal use of **kwargs in harmony_integrate (or helper functions it calls)
is adjusted to accept the new named parameters and that callers are updated if
necessary.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5dd65525-b9aa-4eda-986c-b2e844796cbc

📥 Commits

Reviewing files that changed from the base of the PR and between 1e193b4 and ba19497.

📒 Files selected for processing (13)
  • docs/references.bib
  • src/rapids_singlecell/_cuda/harmony/clustering/clustering.cu
  • src/rapids_singlecell/_cuda/harmony/clustering/kernels_clustering.cuh
  • src/rapids_singlecell/_cuda/harmony/correction/correction_batched.cu
  • src/rapids_singlecell/_cuda/harmony/correction/correction_fast.cu
  • src/rapids_singlecell/_cuda/harmony/correction/kernels_correction_fast.cuh
  • src/rapids_singlecell/_cuda/harmony/pen/kernels_pen.cuh
  • src/rapids_singlecell/_cuda/harmony/pen/pen.cu
  • src/rapids_singlecell/preprocessing/_harmony/__init__.py
  • src/rapids_singlecell/preprocessing/_harmony/_helper.py
  • src/rapids_singlecell/preprocessing/_harmony_integrate.py
  • tests/test_harmony.py
  • tests/test_harmony_kernels.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/rapids_singlecell/preprocessing/_harmony_integrate.py`:
- Around line 23-26: Add an end-to-end unit test that exercises the public API
functions harmony_integrate() and/or harmonize() using a small synthetic dataset
to ensure the new public knobs (stabilized_penalty, dynamic_lambda, alpha,
batch_prune_threshold) are wired correctly: call the function once with the new
defaults (stabilized_penalty=True, dynamic_lambda=True, alpha=0.2,
batch_prune_threshold=1e-5) and once with the Harmony1 fallback
(stabilized_penalty=False, dynamic_lambda=False) and assert key integration
properties (e.g., output shape, no NaNs, batch-correction effect between
batches, and deterministic behavior for fixed seed); vary alpha and
batch_prune_threshold in a couple of cases to ensure parameters are passed
through to the kernels.

In `@src/rapids_singlecell/preprocessing/_harmony/__init__.py`:
- Around line 176-180: The alpha validation currently only checks for negative
values but allows non-finite values (NaN/inf) which break downstream (lambda_kb)
computations; update the validation near the existing alpha check in __init__.py
to reject non-finite values by using a finite check (e.g., math.isfinite or
numpy.isfinite) and raise ValueError if not finite; keep the existing
batch_prune_threshold check unchanged and include alpha in the error message
(e.g., "alpha must be a finite non-negative number, got {alpha}").
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f620f875-30e9-4810-ae3a-197b17326cc8

📥 Commits

Reviewing files that changed from the base of the PR and between ba19497 and bf8d733.

📒 Files selected for processing (5)
  • src/rapids_singlecell/preprocessing/_harmony/__init__.py
  • src/rapids_singlecell/preprocessing/_harmony/_helper.py
  • src/rapids_singlecell/preprocessing/_harmony_integrate.py
  • tests/test_harmony.py
  • tests/test_harmony_kernels.py
💤 Files with no reviewable changes (1)
  • src/rapids_singlecell/preprocessing/_harmony/_helper.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_harmony.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/rapids_singlecell/preprocessing/_harmony/__init__.py (1)

176-181: ⚠️ Potential issue | 🟡 Minor

Reject non-finite alpha values.

alpha=float("nan") and alpha=float("inf") both pass the current check. NaN will poison lambda_kb, and inf will suppress all corrections. Add a finite check.

Suggested fix
-    if alpha < 0:
-        raise ValueError(f"alpha must be non-negative, got {alpha}.")
+    if not np.isfinite(alpha) or alpha < 0:
+        raise ValueError(f"alpha must be a finite, non-negative value, got {alpha}.")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rapids_singlecell/preprocessing/_harmony/__init__.py` around lines 176 -
181, The current validation for the parameter alpha in __init__.py only checks
for negativity but allows non-finite values like NaN and Inf; update the
validation around the alpha check (the block that raises ValueError when alpha <
0) to also reject non-finite values by using a finite check (e.g., math.isfinite
or numpy.isfinite) and raise a ValueError with a clear message (e.g., "alpha
must be a finite non-negative number, got {alpha}") so that NaN and Inf are
rejected before they can affect lambda_kb or corrections.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/rapids_singlecell/preprocessing/_harmony/__init__.py`:
- Around line 176-181: The current validation for the parameter alpha in
__init__.py only checks for negativity but allows non-finite values like NaN and
Inf; update the validation around the alpha check (the block that raises
ValueError when alpha < 0) to also reject non-finite values by using a finite
check (e.g., math.isfinite or numpy.isfinite) and raise a ValueError with a
clear message (e.g., "alpha must be a finite non-negative number, got {alpha}")
so that NaN and Inf are rejected before they can affect lambda_kb or
corrections.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c235f253-98ea-4ce7-9efd-18018e22455e

📥 Commits

Reviewing files that changed from the base of the PR and between bf8d733 and 54065fd.

📒 Files selected for processing (2)
  • src/rapids_singlecell/preprocessing/_harmony/__init__.py
  • tests/test_harmony.py

@Intron7
Copy link
Copy Markdown
Member Author

Intron7 commented Mar 31, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/rapids_singlecell/preprocessing/_harmony/__init__.py`:
- Around line 461-472: The early return when dynamic_lambda is False bypasses
the zero-denominator guard; fix by constructing lambda_kb even in the
fixed-ridge branch (instead of returning cp.full_like(E, ridge_lambda)
immediately) or validate ridge_lambda up front to ensure it cannot be zero, then
always apply the sentinel suppression lambda_kb[(O + lambda_kb) == 0] = sentinel
before returning; refer to the symbols dynamic_lambda, cp.full_like(E,
ridge_lambda), lambda_kb, O, sentinel and ensure the sentinel guard runs
unconditionally for both dynamic and fixed branches (and add a regression test
for dynamic_lambda=False with ridge_lambda=0).

In `@tests/test_harmony.py`:
- Around line 63-90: The fixture adata_ircolitis_harmony2 currently uses
pooch.retrieve to download external files (pcs_file, harmony2_file, obs_file)
which makes tests network-bound; replace this by either bundling a small TSV/CSV
fixture under tests/ (e.g., tests/data/ircolitis_*.tsv.gz) and reading it with
pd.read_csv, or generate a deterministic synthetic AnnData in the fixture that
creates obs (DataFrame) and obsm entries "X_pca" and "harmony2_ref" with
appropriate shapes (use numpy for reproducible random data via a fixed seed);
update the fixture adata_ircolitis_harmony2 to remove pooch.retrieve calls and
load from the new local files or synthetic generation so tests no longer depend
on exampledata.scverse.org.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 50682935-1b5a-4354-b362-5fb05986aa23

📥 Commits

Reviewing files that changed from the base of the PR and between 54065fd and 4c3271b.

📒 Files selected for processing (2)
  • src/rapids_singlecell/preprocessing/_harmony/__init__.py
  • tests/test_harmony.py

@Intron7
Copy link
Copy Markdown
Member Author

Intron7 commented Mar 31, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@Intron7 Intron7 changed the title Harmony-2 update Implement Harmony2 paper improvements for preventing overintegration Mar 31, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 31, 2026

Codecov Report

❌ Patch coverage is 88.23529% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.98%. Comparing base (e67e3c3) to head (595ad71).

Files with missing lines Patch % Lines
...ids_singlecell/preprocessing/_harmony_integrate.py 66.66% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #625      +/-   ##
==========================================
+ Coverage   87.75%   87.98%   +0.23%     
==========================================
  Files          96       96              
  Lines        7013     7027      +14     
==========================================
+ Hits         6154     6183      +29     
+ Misses        859      844      -15     
Files with missing lines Coverage Δ
...pids_singlecell/preprocessing/_harmony/__init__.py 94.15% <100.00%> (+0.73%) ⬆️
...apids_singlecell/preprocessing/_harmony/_helper.py 74.87% <ø> (+6.94%) ⬆️
...ids_singlecell/preprocessing/_harmony_integrate.py 57.50% <66.66%> (+3.92%) ⬆️

@Intron7 Intron7 merged commit 01c94ca into main Apr 8, 2026
23 of 26 checks passed
@Intron7 Intron7 deleted the harmony2-support branch April 8, 2026 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants