Skip to content

[Atomics] New QIPC ops for atomics#690

Open
hughperkins wants to merge 35 commits into
mainfrom
hp/new-qipc-ops-atomics
Open

[Atomics] New QIPC ops for atomics#690
hughperkins wants to merge 35 commits into
mainfrom
hp/new-qipc-ops-atomics

Conversation

@hughperkins
Copy link
Copy Markdown
Collaborator

Issue: #

Brief Summary

copilot:summary

Walkthrough

copilot:walkthrough

hughperkins added 28 commits May 8, 2026 14:29
Pins the doc claim that vector / matrix arguments to qd.atomic_add fan
out to one scalar atomic per component on every GPU backend. The
existing test_atomic_float_ops only covers scalar ndarray destinations.
Pins the doc claim that qd.atomic_and / atomic_or / atomic_xor reject
f32 / f64 dtypes at trace time. Parametrized over (op, dtype) on
qd.gpu; f64 is gated through skip_if_f64_unsupported.
Pins the per-dtype doc table for atomic_add (i32 / u32 / i64 / u64)
and atomic_and / atomic_or / atomic_xor (u32 / i64 / u64) on qd.gpu.
Existing coverage was effectively i32-only for these paths.
Pins the doc claim that atomic_mul (CAS loop on every backend) works
correctly under multi-thread contention and supports f64. Previous
mul tests were all single-thread or used integer shared memory only.
Pins the doc claim that floating-point atomic_min / atomic_max behave
as minNum / maxNum: when exactly one operand is NaN, the non-NaN
value wins. Both arg orders covered for f32 and f64. CPU is left out
of scope per the doc note about its order-dependent CAS-loop path.
Removes the arch=[qd.cpu, qd.cuda, qd.amdgpu] allowlist and the
"# TODO(): Vulkan support" markers on the f16 atomic tests. The
SPIR-V f16-atomic-cap gate is preserved via _skip_if_no_f16_atomic_on_spirv,
matching the pattern in test_atomic.py::test_atomic_float_ops. CPU
coverage of f16 atomics is retained by test_atomic_float_ops.
When run on CUDA today, qd.atomic_and(f32_field, ...) does not raise
at trace time as the doc claims; it slides past
AtomicOpExpression::type_check (which only checks PrimitiveType, not
is_integral for the bit_* ops) and SIGABRTs inside launch_kernel.

Skip the test with a TODO so the pytest worker survives, but keep the
test body in place so it can be re-enabled the moment the frontend
trace-time check is added.
Adds an is_integral check in AtomicOpExpression::type_check for
bit_and / bit_or / bit_xor so that qd.atomic_and / atomic_or /
atomic_xor on f32 / f64 raise QuadrantsTypeError at trace time
instead of sliding through codegen and SIGABRT'ing in launch_kernel
on CUDA. Matches the doc claim on docs/source/user_guide/atomics.md
("Integer dtypes only -- passing f32 / f64 raises a type error at
trace time").

Unskips test_atomic_bitwise_on_float_field_raises (6 cases).
Metal's RHI (quadrants/rhi/metal/metal_device.mm) sets
`spirv_has_atomic_int64` on Apple7+ / Mac2 via the (misnamed)
`feature_floating_point_atomics` gate, but MSL only ever exposes 64-bit
atomics as `atomic_fetch_min/max` on `uint64`, starting at Apple GPU
family 9 (M3 / A17). `atomic_add` / `sub` / `mul` and the bitwise
family don't exist for 64-bit on any Apple GPU, so pipeline create
fails with `RhiResult=-1` and tears the worker down.

Skip the affected dtype rows for `test_atomic_add_int_contention` and
`test_atomic_bitwise_int_widths` on Metal via a small helper, and add
a footnote to the atomics doc table calling out the Metal limitation.
Tightening / removing the over-permissive cap in `metal_device.mm` is
deliberately out of scope here -- the cap is also consumed by adstack
fallbacks in `runtime/gfx`, so it needs its own audit.

Verified on AMDGPU (MI300X): all 4 dtypes for atomic_add and all
3 dtypes x 3 ops for bitwise still PASS (no test gets skipped on
non-Metal GPUs).
…labels

- Reflow seven multi-line `# Pins the doc claim ...` blocks in
  test_atomic.py from ~70-82c (AI default) to ~110-115c (project 120c
  target). No content changes; pure rewrap.
- Restore inline `# Parallel sum` / `# Serial sum` (and max / min)
  labels in test_atomic_{add,max,min}_f16. They got dropped in
  4fd39f8 when retargeting to qd.gpu but still describe the test
  structure (parallel atomic accumulation vs serial reference loop).
…/ Metal

Wires a new `qd.atomic_exchange(dest, val)` primitive end-to-end:
unconditionally writes `val` into `dest`, returns the old value of
`dest`. Lowers to one native instruction on every backend (CUDA
`atomicExch`, AMDGPU `*_atomic_swap`, SPIR-V `OpAtomicExchange`,
x86 `xchg`).

Layers touched:
- `AtomicOpType::xchg` enum + name table; carve-out in
  `atomic_to_binary_op_type` (xchg has no binary equivalent).
- pybind binding `expr_atomic_xchg` and Python wrapper
  `qd.atomic_exchange` with docstring + example.
- `demote_atomics`: special-case xchg to load + store(val) instead of
  going through the load + binary + store path that the other ops use.
- LLVM codegen (covers CPU / CUDA / AMDGPU): one new entry in the
  AtomicRMW BinOp map for integers; explicit AtomicRMW Xchg case for
  f32 / f64 in `real_type_atomic`.
- SPIR-V codegen (Vulkan / Metal): `OpAtomicExchange` branch in the
  integer path, plus a uint-bitcast path for f32 / f64 global-memory
  xchg that avoids any `spirv_has_atomic_float_*` cap dependency.

Coverage gaps deliberately left for follow-up (called out in
docs/source/user_guide/atomics.md):
- `f16` xchg (would need f16-specific bitcast in real_type_atomic).
- Workgroup-shared float xchg in SPIR-V (would need uint-backing
  analogous to the existing add CAS path).
- i64 / u64 xchg on Metal (same hardware limitation as the rest of the
  64-bit integer atomic family; the existing `_skip_if_no_int64_atomic_rmw`
  helper covers it).

Tests (parametrized on `qd.gpu`, all green on CUDA, Vulkan, AMDGPU --
Metal verification deferred to the user's MacBook):
- `test_atomic_exchange_returns_old_value` -- single-thread sanity over
  i32 / u32 / i64 / u64 / f32 / f64.
- `test_atomic_exchange_swap_under_contention` -- N threads each xchg
  a unique value; assert no value is lost or duplicated (multiset
  invariant).
- `test_atomic_exchange_vector_field_fanout` -- vector arg fans out
  per component, last-writer-wins.

Doc: new row in the support table with footnotes for the gaps above,
new `qd.atomic_exchange` semantics section with a take-ownership
example, and an updated note relating xchg to the still-not-exposed
`atomic_cas`.
test_api[qd] in tests/python/test_api.py is a sorted-equality check
of `dir(quadrants)` against a hand-maintained allow-list. Forgot to
add `atomic_exchange` to that list when wiring the new op, so the
manylinux wheel-test job now fails with `+ 'atomic_exchange'`.
…test

Two small follow-ups to the atomic_exchange PR:

- demote_atomics.cpp: the "no binary equivalent" comment got wrapped at
  ~80c by the original edit. Reflow to one ~115c line so the line-wrap
  CI bot stops flagging it.
- test_atomic_exchange_matrix_field_fanout: pins the doc claim that
  "Vector / matrix arguments fan out per component" for the matrix case
  too. Mirrors test_atomic_add_matrix_field_fanout exactly; the xchg
  vector test (test_atomic_exchange_vector_field_fanout) is already
  green on CUDA / Vulkan / AMDGPU and the matrix version walks the
  same OpAtomicExchange-per-component lowering with a 2x3 slot.
The xchg branch in DemoteAtomics::visit(AtomicOpStmt*) (introduced in
commit 82d0572) is structurally different from every other atomic
op: it lowers to a bare load + store(val) instead of load + binop +
store, because the new value is independent of the old. Coverage bot
flagged it as untested; the existing field-target xchg tests don't
exercise it because field locations are never demoted.

test_atomic_exchange_demoted mirrors test_atomic_add_demoted exactly
(thread-local destination forces the demote pass to fire), and pins
the demoted semantics: first xchg returns the initial value, second
xchg returns the value the first xchg wrote, proving the demoted
load + store round-trips correctly.
Wrap-check bot flagged three blocks that were wrapped at ~102-105c
when they could pack to ~115c without exceeding the 120c project
target:

- ops.py atomic_exchange docstring (was 102/103/X -> now 117/111/66)
- test_atomic.py xchg vector-fanout comment (was 105/X/X -> 118/106/93)
- codegen_llvm.cpp xchg-FP comment (was 105/106/X -> 117/115/X)

No semantic changes.
Comment inside the take-ownership atomic_exchange example was wrapped
at ~79 chars (AI default), tripping the wrap-check bot. Rejoin onto a
~95-char line; whole comment now fits on two natural-sentence lines.
* expression_printer.h: add the missing `atomic_xchg` entry to `names_table`
  so AtomicOpType::xchg (= 8) no longer reads past the end of the array, and
  flip the bounds check from `>` to `>=` to make it correct for any future
  op (e.g. cas). Also fix a pre-existing latent bug where the `min` and
  `max` slots were swapped, so IR dumps now print the right op name.
* stmt_op_types.cpp: document that `atomic_to_binary_op_type` intentionally
  has no entry for `mul` or `xchg` (they have no native binary form and
  callers MUST special-case them before dispatching here -- see
  demote_atomics.cpp).
* spirv_codegen.cpp: promote the unsupported xchg fallbacks (shared float
  arrays, global f16) from a generic QD_NOT_IMPLEMENTED hard abort to
  clearer QD_ASSERT_INFO diagnostics that name the missing path.
# Conflicts:
#	docs/source/user_guide/atomics.md
#	tests/python/test_atomic.py
Wires a new `qd.atomic_cas(dest, expected, desired)` primitive across
all backends. Returns the value originally at `dest`; the user
recovers success with `(returned == expected)`. Matches CUDA
`atomicCAS` and SPIR-V `OpAtomicCompareExchange` shape.

CAS is the basic primitive on top of which arbitrary atomic
read-modify-write operations can be built with a retry loop -- the
xchg PR exposed unconditional swap, this one exposes the conditional
form.

IR plumbing (the only layer where CAS is structurally novel: it's
the first atomic op with three operands):

- AtomicOpType::cas enum + name table; intentionally NOT registered
  in atomic_to_binary_op_type (no binary equivalent, like xchg).
- AtomicOpStmt grows an optional `Stmt *expected` field (default
  nullptr, ignored by every other op_type) plus a 4-arg constructor.
  Existing 3-arg constructor is unchanged so all callers keep
  compiling. expected is added to QD_STMT_DEF_FIELDS so cloning /
  serialization round-trips it.
- AtomicOpExpression similarly grows an `expected` Expr slot + 4-arg
  constructor. type_check requires integer dtypes for all three
  operands (first-pass scope; f32/f64 CAS via uint-bitcast deferred,
  same gap xchg has). flatten threads the third operand into the
  4-arg AtomicOpStmt constructor for the cas case.

Bindings + wrapper:

- pybind: m.def("expr_atomic_cas", ...) -> AtomicOpExpression(cas, ...).
- ops.py: new qd.atomic_cas(x, expected, desired) wrapper. Doesn't
  fit @writeback_binary (3 args), so the lvalue check + wrap_if_not_expr
  are inlined. Added to __all__.
- test_api.py: atomic_cas in the public-API allow-list.

Codegen:

- LLVM (CPU / CUDA / AMDGPU): one new branch in integral_type_atomic
  that emits CreateAtomicCmpXchg and projects field 0 (loaded prior
  value) -- single instruction on every target.
- SPIR-V (Vulkan / Metal): new CAS branch in the integer path emits
  OpAtomicCompareExchange directly (operands: scope, sem_eq, sem_neq,
  value, comparator -- comparator order is from the SPIR-V spec, not
  intuitive). The existing internal CAS uses (used by atomic_op_using_cas
  and the shared-array float CAS path) confirm both backends already
  produce the right machine code; the only new work was a public path.

demote_atomics:

- thread-local CAS demotes to load + cmp_eq + select(cmp, val, load) +
  store. Materializing select-then-unconditional-store avoids needing
  to rewrite into an IfStmt mid-pass. Returned value is the prior
  load, matching the contended-path semantics.

Coverage gaps deliberately left for follow-up (will be called out in
docs/source/user_guide/atomics.md in the next commit):

- f32 / f64 CAS (would need the same uint-bitcast path xchg uses; the
  type_check carve-out raises a clean QuadrantsTypeError today).
- f16 CAS (same f16-deferral as f16 xchg).
- Workgroup-shared float CAS (same uint-backing requirement as shared
  float xchg/add).
- i64 / u64 CAS on Metal (same hardware limitation as the rest of the
  64-bit integer atomic family; the existing _skip_if_no_int64_atomic_rmw
  helper will cover it once the test lands).

Tests + docs land in follow-up commits on this branch.
Five new tests parametrized on qd.gpu (i64 / u64 covered by the
existing _skip_if_no_int64_atomic_rmw helper for Metal):

- test_atomic_cas_returns_old_value -- single-thread sanity, success
  path AND failure path, all integer dtypes (i32 / u32 / i64 / u64).
- test_atomic_cas_contention_single_winner -- N parallel threads each
  try to flip a slot from INIT to their own unique id; assert exactly
  one thread sees prior == INIT (the winner) and every loser sees
  prior == winner_id, and the slot ends with winner_id (no torn
  writes). The canonical "single CAS winner" invariant.
- test_atomic_cas_loop_increment_matches_atomic_add -- user-built CAS
  retry loop produces the same final counter as atomic_add. Outer for
  is serialized (qd.loop_config(serialize=True)) so the bounded retry
  budget is robust; the point is to validate the user-facing CAS-loop
  pattern, not to stress-test convergence.
- test_atomic_cas_demoted -- thread-local destination forces the
  demote pass to fire (parallel to test_atomic_exchange_demoted).
  Pins both legs: success-path swap fires, failure-path is a no-op,
  both return the prior load.
- test_atomic_cas_on_float_field_raises -- pins the doc claim that
  f32 / f64 CAS raises QuadrantsCompilationError at trace time. Will
  flip to passing when the float CAS bitcast path lands.

Doc:

- New row in the support table for atomic_cas with the f32 / f64
  carve-out (footnote §) and the same i64 / u64 Metal caveat (†) as
  the rest of the family.
- New § footnote explaining that float CAS is rejected at trace time
  but trivially enableable via the same uint-bitcast trick xchg uses.
- Updated the "There is no atomic_cas" paragraph -> dropped (CAS is
  now exposed) and replaced with a Semantics section showing the
  load + cmp + conditional-store transaction, the success-flag
  recovery pattern (`success = (old == expected)`), and a
  CAS-loop example for building atomic ops the framework doesn't
  expose natively.
Follow-on to the printer/carve-out cleanup on the parent xchg PR:
* Adds the `atomic_cas` entry to `names_table` (slot 9) so dumping a
  CAS expression prints the right op name (and stays in-bounds w.r.t.
  the new `>=` guard).
* Extends the visit method to also emit `expected=...` for cas, since
  cas is the only AtomicOpExpression with a 3rd operand.
* Updates the `atomic_to_binary_op_type` carve-out comment to mention
  `cas` alongside `mul` and `xchg` -- cas takes (dest, expected,
  desired), so it has no plain binary form either.
Per review: don't promise/preview the wiring for f32 / f64 atomic_cas
in the user guide; just document the current trace-time rejection and
keep the rest of the integer-CAS support note unchanged.
"fully supported, and the i64 / u64 Metal caveat ..." was self-
contradictory: if it's fully supported there's no caveat. Spell out
that the support is per the per-backend table above, with the same
Metal i64 / u64 caveat as the rest of the 64-bit integer family.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5c2ac851e3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +982 to +985
auto expected_dtype = expected.get_rvalue_type();
if (expected_dtype->is<TensorType>()) {
expected_dtype = expected_dtype.get_element_type();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject or scalarize tensor CAS operands

When atomic_cas is called on a vector/matrix lvalue, this accepts tensor operands by peeling expected down to its element type, but the existing tensor scalarization paths only split dest and val and recreate AtomicOpStmt(stmt->op_type, ..., val_i) without carrying expected. The resulting scalar CAS statements have expected == nullptr, so the new CAS codegen/demotion assertions fire for inputs like qd.atomic_cas(vec_field[None], qd.Vector([0, 0]), qd.Vector([1, 1])). Until the scalarizers are updated to split and pass expected_i, tensor CAS should be rejected here.

Useful? React with 👍 / 👎.

Comment on lines +986 to +989
if (!is_integral(expected_dtype)) {
ErrorEmitter(
QuadrantsTypeError(), this,
fmt::format("'atomic_cas' requires integer 'expected' operand, got '{}'", expected->ret_type->to_string()));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Cast CAS expected to the destination type

This only checks that expected is integral, but it never casts it to the pointee type. irpass::type_check will cast val through type_check_store, while expected is passed directly to LLVM CreateAtomicCmpXchg / SPIR-V OpAtomicCompareExchange, both of which require the comparator to have the same type as the value in memory. Common documented calls such as qd.atomic_cas(i64_field[None], 0, 1) leave expected as the default i32, causing invalid IR/codegen instead of behaving like the other atomic RHS operands.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown

@github-actions
Copy link
Copy Markdown

Copy link
Copy Markdown
Collaborator

@alanray-tech alanray-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

The two issues flagged by Codex are both real and must be fixed before merge:

1. Tensor CAS scalarization will crash (vector/matrix atomic_cas)

type_check in frontend_ir.cpp:982-984 accepts tensor-typed expected (peeling it to element type), but neither scalarize.cpp:531 nor lower_matrix_ptr.cpp:419 propagates expected when splitting atomic ops into per-component statements -- they both use the 3-arg AtomicOpStmt constructor which sets expected = nullptr. Any call like qd.atomic_cas(vec_field[None], Vector([0, 0]), Vector([1, 1])) will hit QD_ASSERT(stmt->expected != nullptr) in codegen/demotion and crash.

Fix: Either reject tensor destinations for CAS in type_check (smallest contact area), or update both scalarizers to split and thread expected_i into the 4-arg constructor. The former is safer for a first landing.

2. CAS expected operand is never type-cast to the pointee type

type_check validates that expected is integral but never inserts a cast to match the destination type. Meanwhile val gets implicit widening via type_check_store in irpass::type_check (type_check.cpp:74). A common call like qd.atomic_cas(i64_field[None], 0, 1) leaves expected as i32 while memory holds i64, producing invalid IR for both LLVM CreateAtomicCmpXchg and SPIR-V OpAtomicCompareExchange (both require all operands to have the same type as the value in memory).

Fix: Add a type_check_store-equivalent for stmt->expected in the AtomicOpStmt visitor in type_check.cpp, or cast expected alongside val in AtomicOpExpression::type_check.


A few additional suggestions are left as inline comments.

Comment thread quadrants/ir/frontend_ir.h Outdated
}

AtomicOpExpression(AtomicOpType op_type, const Expr &dest, const Expr &expected, const Expr &val)
: op_type(op_type), dest(dest), val(val), expected(expected) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the member initializer list order dest(dest), val(val), expected(expected) doesn't match the parameter order (dest, expected, val). C++ initializes in declaration order (dest, val, expected), so runtime behavior is correct, but the mismatch between parameter semantics and init order is a readability trap for maintainers. Consider either reordering the member declarations to dest, expected, val or swapping the parameter order to (dest, val, expected) to make the intent obvious.

Comment on lines +1768 to +1770
ir_->make_inst(spv::OpMemoryBarrier, ir_->const_i32_one_,
ir_->uint_immediate_number(ir_->u32_type(), spv::MemorySemanticsAcquireReleaseMask |
spv::MemorySemanticsUniformMemoryMask));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This OpMemoryBarrier is inserted before the CAS but none of the other atomic ops (including xchg, atomic_add, atomic_min, etc.) have it. OpAtomicCompareExchange already provides its own memory semantics via the sem_eq / sem_neq operands (which are const_i32_zero_ here -- Relaxed). If the barrier is genuinely needed for CAS correctness, it should also be needed for the other atomic ops (they all use Relaxed semantics too). If it's not needed, it adds an unnecessary performance cost and an asymmetry that will confuse readers. Please either remove it or add a comment explaining why CAS specifically needs an extra barrier that no other atomic op does.

if old == expected:
x[()] = desired
return old
if not (is_quadrants_expr(x) and x.ptr.is_lvalue()):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: @writeback_binary (used by all other atomic ops) has an extra guard if isinstance(a, Field) or isinstance(b, Field): return NotImplemented that lets Python's operator protocol fall through cleanly. atomic_cas inlines the lvalue check but skips this Field guard. Since atomic_cas is a regular function call (not an operator), the NotImplemented path isn't strictly needed, but if someone passes a raw Field instead of field[None], they'll get a confusing AttributeError from x.ptr rather than the clear QuadrantsSyntaxError. Consider adding a similar early check or at least a guard like if isinstance(x, Field): raise QuadrantsSyntaxError(...).

@hughperkins hughperkins changed the title [Atomics] Add new QIPCatomic ops [Atomics] Add new QIPC atomic ops May 12, 2026
# Conflicts:
#	docs/source/user_guide/atomics.md
Two related P1 fixes from PR #690 review (Codex + alanray-tech):

1. Reject tensor (vector / matrix) destinations explicitly in
   AtomicOpExpression::type_check. The other atomic ops fan out via
   scalarize / lower_matrix_ptr, but those passes use the 3-arg
   AtomicOpStmt constructor and would silently drop `expected`,
   tripping QD_ASSERT(stmt->expected) in codegen / demotion. Until
   the scalarizers grow a 4-arg path, refusing tensor CAS at trace
   time is the safest landing.

2. Cast `expected` to the destination element type in the same
   type_check carve-out. LLVM CreateAtomicCmpXchg and SPIR-V
   OpAtomicCompareExchange both require the comparator width to
   match the in-memory value; without this, a common call like
   `qd.atomic_cas(i64_field[None], 0, 1)` would leave `expected`
   at the i32 default and produce invalid IR. Mirrors how `val`
   gets widened in irpass::type_check::type_check_store, which
   doesn't see `expected`.

Tests added:
* test_atomic_cas_on_vector_field_raises - pins the tensor reject.
* test_atomic_cas_expected_int_literal_widens_to_i64 - pins that
  plain Python int literals work as the comparator on i64.
…change

Per PR #690 review (alanray-tech): no other atomic op in this file
emits an OpMemoryBarrier alongside its OpAtomicI* / OpAtomicExchange,
and OpAtomicCompareExchange already carries its own sem_eq / sem_neq
operands (which we set to Relaxed, matching the rest of the file).
The extra barrier was an asymmetric copy from elsewhere; removing it
keeps the surrounding-memory ordering contract consistent with every
other atomic - users pair atomics with qd.simt.block.mem_fence() /
qd.simt.grid.mem_fence() if they need cross-location ordering.
Per PR #690 review (alanray-tech): the 4-arg constructor takes
`(dest, expected, val)` but member declaration was `dest, val,
expected`. Behavior was correct (C++ initializes in declaration
order regardless), but the mismatch was a maintainer trap and
clang-tidy normally flags this kind of init-list-out-of-order.

Reorder the members to `dest, expected, val` so both the 4-arg
ctor's init list and the parameter order read top-to-bottom in
the same direction. The 3-arg ctor still leaves `expected` empty
via Expr's default constructor.
@hughperkins hughperkins changed the title [Atomics] Add new QIPC atomic ops [Atomics] New QIPC ops for atomics May 12, 2026
Per PR #690 review (alanray-tech): @writeback_binary (used by every
other atomic op) returns NotImplemented when called on a raw Field,
which lets Python's operator protocol fall through cleanly and gives
the user a clean error. atomic_cas takes 3 args so it doesn't use
@writeback_binary; previously, passing a Field (rather than
`field[None]`) would crash later with a confusing AttributeError on
x.ptr.

Add an explicit isinstance(x, Field) check that raises a clear
QuadrantsSyntaxError pointing the user at the correct call shape.
Test added: test_atomic_cas_raw_field_raises_clear_error.
@hughperkins
Copy link
Copy Markdown
Collaborator Author

Comments addressed, per Opus:

Screenshot 2026-05-12 at 12 32 57

Three new tests broadening atomic_cas coverage beyond the qd.field
surface, in response to a follow-up review on PR #690:

* test_atomic_cas_on_ndarray: pin that CAS works on qd.ndarray
  elements, not just qd.field. Ndarrays go through a distinct
  physical-pointer subscript path. Uses an N-thread contention
  kernel so atomicity gets exercised on the ndarray surface too.

* test_atomic_cas_on_vector_field_scalar_component: pin that the
  documented workaround for the new tensor-CAS rejection -- extract
  a scalar with `vec_field[None][i]` and CAS on it -- actually
  works. Without this the docs would lie.

* test_atomic_cas_expected_signed_int_literal_casts_to_unsigned:
  pin that the new `expected` cast in AtomicOpExpression::type_check
  handles signed -> unsigned conversion (i.e. -1 -> 0xFFFF...).
  Complements the i64 widening test which only covers positive
  same-sign widening.
@github-actions
Copy link
Copy Markdown

@github-actions
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants