Skip to content

chore(release): promote rc-2026.6.2 to 0.2.8#129

Merged
jacderida merged 66 commits into
mainfrom
rc-2026.6.2
Jun 18, 2026
Merged

chore(release): promote rc-2026.6.2 to 0.2.8#129
jacderida merged 66 commits into
mainfrom
rc-2026.6.2

Conversation

@jacderida

Copy link
Copy Markdown
Contributor

Promotes ant-core + ant-cli to 0.2.8 (final), stripping -rc and pinning upstreams to their published crates.io versions:

  • ant-protocol 2.2.0, ant-node 0.13.0 (runtime optional + test-utils dev-dep)
  • saorsa-core 0.26.0 comes through ant-protocol's re-exports

Verified: cargo check --all-targets --all-features passes against the published crates. ant-core publishes to crates.io; ant-cli ships a GitHub binary.

jacderida and others added 30 commits June 8, 2026 13:51
The rc-2026.6.2 cut rewrote only ant-core's runtime `ant-protocol`
dep to the git rc branch, leaving the optional `devnet` ant-node and
the test-only ant-node/saorsa-core dev-deps on their released
versions (ant-node 0.11.6 -> ant-protocol 2.1.2 / saorsa-core 0.24.5).
That pulled a second protocol lineage into the graph, so any target
bridging ant-core and ant-node (devnet, E2E, merkle-e2e tests) saw two
incompatible copies of `ant_protocol::transport::P2PNode` and failed
to compile with E0308. Point all three pins at the matching rc
branches so the graph collapses to a single git-rc lineage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The CLI merkle upload path stored each wave of 64 chunks through
`merkle_store_with_retry` with up to 4 attempts and 30s jittered backoffs,
and a hard barrier: wave N+1 could not start until wave N's retry loop fully
drained. A handful of quorum-short chunks therefore parked the wave's other
~63 slots idle through multiple backoffs — the single biggest throughput sink
on the PROD-UL-01 run (one wave alone burned 34 minutes).

Port the download path's deferred-retry design to the upload path:

- Store each wave in a single pass (`max_attempts = 1`, no backoff) so a wave
  never blocks on a slow chunk.
- Collect quorum-short chunks into a file-level deferred set and advance to the
  next wave immediately.
- After the last wave, retry the whole deferred set in concurrent rounds with
  `[0, 15, 45]s` delays (matching the download path), re-reading each chunk's
  body from the spill at retry time (peak RAM unchanged) and reusing its proof.

Failure semantics are preserved: chunks still short after the final round
surface as `PartialUpload`; a non-quorum error aborts as `PartialUpload` while
preserving earlier progress. Stats and progress numbering are carried across
rounds, with each deferred round's successes recorded in its own histogram slot.
Total per-chunk retry budget is unchanged (1 wave pass + 3 deferred rounds).

Adds `merkle_deferred_retry`, `DeferredRetryOutcome`,
`deferred_round_histogram_slot`, `DEFERRED_ROUND_DELAYS_SECS`, and unit tests.

V2-466

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… store AIMD

The adaptive store concurrency limiter never ramped from its cold-start of
8 and got crushed to a +1-per-window crawl because two non-capacity
signals polluted its health input on the merkle upload path:

- Node-side PUT latency is dominated by the ~28s synchronous merkle
  closeness lookup, inflating client-observed p95/median to 3-6x and
  tripping the latency-vs-baseline Decrease even though nothing about it
  is local congestion.
- Remote application rejections (pool-rejected, disk-full, quote-stale)
  arrived as Error::Protocol / flattened Error::InsufficientPeers and were
  classified as NetworkError, counting against success_target and driving
  multiplicative decrease. With the default slow_start_ramp_threshold of 0,
  a single such Decrease permanently exited slow-start.

Apply the fetch-channel precedent to the store channel (the situation is
structurally identical — verification variance instead of retry variance),
plus preserve the structured remote rejection reason so it classifies
correctly. The cold-start floor of 8 is deliberately unchanged.

- adaptive.rs: store_cfg.latency_decrease_enabled = false and
  store_cfg.slow_start_ramp_threshold = usize::MAX, so a transient Decrease
  halves but the next healthy window re-doubles. Genuine store congestion
  still surfaces via the timeout-rate ceiling.
- error.rs/chunk.rs: new Error::RemotePut { address, source: ProtocolError }
  carrying the structured upstream discriminant instead of stringifying it
  into Error::Protocol. A ChunkPutResponse::Error means the transport
  round-trip succeeded and the node declined at the application layer.
- chunk.rs: chunk_put_to_close_group surfaces a representative RemotePut for
  app-only quorum shortfalls; any genuine transport failure keeps it
  InsufficientPeers so real congestion still cuts the cap.
- mod.rs: classify_error maps RemotePut to ApplicationError.
- merkle.rs: merkle_store_with_retry treats RemotePut as recoverable
  (defer/retry) like InsufficientPeers, so transient rejections don't abort
  the upload.

Adds unit coverage: store ramps/recovers under the new tuning while a
timeout burst still cuts it; remote app-rejections don't move the cap;
RemotePut is recoverable in the retry path.

Linear: V2-468

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fix(client): stop verification latency and app-rejections suppressing store AIMD
…ounting

Addresses review on the deferred merkle-upload retry path.

1. Memory bound (high): the deferred pass read every quorum-short chunk in the
   whole file into one Vec per round before storing, so peak resident bodies
   scaled with the file-wide deferred count rather than the wave path's
   ~UPLOAD_WAVE_SIZE / ~256 MB bound. merkle_deferred_retry now takes a
   batch_size and processes each round in batches of that size, re-reading only
   one batch of bodies from the spill at a time. The CLI caller passes
   UPLOAD_WAVE_SIZE.

2. Fatal-abort accounting (medium): merkle_store_with_retry returned Err on a
   non-quorum error, discarding the successes already recorded in that pass; the
   wave/deferred callers then built PartialUpload from stale state (could report
   failed_count = 0 and omit same-pass stores). The store helper now preserves
   same-pass successes (stored/stored_addresses), records the fatal chunk as
   failed, and surfaces the error via a new MerkleStoreOutcome::fatal field
   instead of Err. The external-signer path re-raises fatal as Err to keep its
   all-or-nothing contract; the CLI wave and deferred paths fold it into a
   PartialUpload whose failed set is derived authoritatively as every input
   chunk not in stored_addresses (shared partial_upload_after_fatal helper), so
   stored_count + failed_count accounts for the whole file. This also fixes the
   pre-existing wave-path under-reporting the review noted.

Tests: same-pass successes preserved on fatal; deferred reads bounded to
batch_size; updated the non-quorum-error test to assert fatal-in-outcome.
cargo test -p ant-core --lib -> 338 passed; clippy and fmt clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_uploads

feat(client): download-style deferred retry for merkle uploads
Use ant-protocol via git instead of the local checkout path.

BREAKING CHANGE: removes ant-core bootstrap-cache recording hooks and the bootstrap-cache E2E/dev-dependency surface.
Keep Client::connect exact to its supplied bootstrap peers while preserving CLI cache warm-start behavior. Filter cached bootstrap addresses for ipv4-only runs and update git dependency locks to the pushed timeout-removal branches.

SemVer: patch
Retain cached bootstrap peers by peer-id keyspace coverage before recency while still enforcing IP diversity limits. Recency remains the tie-breaker among equally diverse candidates.

SemVer: patch
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The cold-start-from-disk bootstrap cache test that used this dev-dependency
was removed with the rest of the bootstrap cache integration, so the direct
saorsa-core dev-dependency is now dead. Removing it keeps the manifest and
lockfile consistent (the lock no longer carries the ant-core -> saorsa-core
edge).
estimate_upload_cost sampled only the first ESTIMATE_SAMPLE_CAP chunk
addresses, so a file whose leading chunks were already stored but whose
tail was new returned CostEstimationInconclusive even though a real
estimate was obtainable. Display consumers (the GUI) were left with no
value to show.

- Distributed sampling: sample addresses spread evenly across the whole
  chunk list instead of the first N (distributed_sample_indices, unit-
  tested). Files with <= cap chunks still sample every chunk, preserving
  exact "whole file sampled" detection.
- The residual all-stored-but-incomplete case returns Ok with
  storage_cost_atto "0" instead of erroring, tagged with a new
  CostEstimateConfidence enum (PricedSample / VerifiedAllAlreadyStored /
  AllSamplesAlreadyStoredIncomplete). The CLI renders the confidence.

UploadCostEstimate is now #[non_exhaustive] with a #[serde(default)]
confidence field. Error::CostEstimationInconclusive is retained (no
longer produced) to avoid removing a public variant.

BREAKING CHANGE: UploadCostEstimate is #[non_exhaustive] and gained a
`confidence` field; downstream code constructing or exhaustively
destructuring it must update.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add `Client::file_download_to_sender`, which downloads + decrypts a file and
streams the plaintext to a caller-provided `mpsc::Sender<Result<Bytes>>`
instead of writing to disk. Constant memory (one decrypt batch resident at a
time, same as `file_download`), and the caller receives bytes progressively as
each batch decrypts — suitable for forwarding to an HTTP chunked body or a
gRPC response stream. The bounded sink applies backpressure; a dropped receiver
(client disconnect) ends the download early.

Implemented by extracting the existing batched-fetch + streaming-decrypt loop
out of `file_download_with_progress` into a private sink-parameterized core,
`download_decrypted_chunks(.., on_chunk)`. `file_download_with_progress` is now
a thin wrapper whose sink writes to the temp file + atomic-renames (behavior
unchanged); the new method's sink forwards to the channel. No duplication of
the fetch/retry logic, and `&self` is preserved (the caller spawns + owns the
Receiver), so no `Client: Clone`/`'static` bound is required.

Adds an e2e round-trip test that streams a multi-batch (~1 MiB) file through
the channel and asserts the reassembled bytes equal the source.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop redundant .into() on already-Bytes decrypt result (clippy useless_conversion) and apply rustfmt reflows in file.rs + e2e_file.rs. No behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- TempDownload RAII guard: removes the staging file on every disk-path
  error AND on a panic unwind out of the block_in_place decrypt loop,
  replacing three duplicated cleanup arms (#1). drop(file) before rename
  for Windows.
- New Error::Cancelled variant for a dropped receiver; was misclassified
  as Error::Network (#3). Routed to ApplicationError in classify_error so
  caller-initiated cancellation is not retried as a transport failure.
- Doc the exact channel item type Result<Bytes, Error> on
  file_download_to_sender (#4).
- Drop now-stale #[allow(clippy::unused_async)] on file_download (#7).
- Harden e2e test: assert each streamed chunk is non-empty and >=2
  segments arrive (multi-batch property), rename to
  test_file_download_to_sender_multibatch_round_trip (#6).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
It was an intra-doc link from the public file_download_with_progress to the private TempDownload struct, tripping -D rustdoc::private_intra_doc_links. A plain code span conveys the same thing without the link.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-streaming

feat(data): stream decrypted file download to a channel sink
…461)

The single-node payment path aborted the entire file on the first wave
with any chunk short of quorum: `upload_spill_addresses_single`
`?`-propagated the per-wave `PartialUpload` from
`batch_upload_chunks_with_events`, so later waves — already
self-encrypted, spilled, and sometimes already paid — were never
attempted. In PROD-UL-02 this turned ~85% per-chunk success into 0%
per-file success, killing every upload at wave 1 of N.

Align it with the merkle path (`upload_waves_merkle`): a wave short of
quorum records its failed chunks and continues; after all waves are
attempted the file returns a single `Error::PartialUpload` with the full
stored/failed breakdown. Genuinely fatal errors (wallet/payment
infrastructure, missing proofs, spill reads) still abort immediately.
The recoverable-vs-fatal decision is factored into a pure `fold_single_wave`
helper with unit tests. Because `UPLOAD_WAVE_SIZE == PAYMENT_WAVE_SIZE`,
each batch call is exactly one payment wave, so folding its `PartialUpload`
leaves nothing un-attempted within the wave.

Also surface on-chain spend on a partial upload: a partial still pays for
the chunks it paid for, but the spend was silently dropped. Add a boxed
`PartialUploadSpend` (storage_cost_atto + gas_cost_wei) to
`Error::PartialUpload`, populate it at every raise site (single-node,
merkle, external-signer), and report it in the CLI (human + JSON). Boxed
to keep `Error` under clippy's `result_large_err` threshold.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…V2-461)

Large-file single-node (--no-merkle) uploads OOM'd on small hosts: store
concurrency could ramp to the wave size (64) and the send path holds each
~4 MB chunk body in flight, so a wave of large chunks pinned several GB.

Cap store concurrency in store_paid_chunks_with_events by combined in-flight
body bytes (STORE_INFLIGHT_BYTE_BUDGET, 64 MB) instead of chunk count, so
~4 MB chunks drop to ~16 concurrent stores while small chunks are unaffected.

This is the standalone memory fix; no saorsa-core change is required.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fix: continue single-node uploads on partial waves and bound store memory (V2-461)
jacderida and others added 28 commits June 14, 2026 23:49
feat(client): use witnessed SNP quote selection
Consume the witnessed close-group transcript from saorsa-core and compute quorum, vote counts, and fallback quote candidates in ant-client. Quote collection now keeps all quorum-recognised candidates available for reachability fallback, then pays the closest successful close group.

SemVer: feature change; no public ant-client API break expected.
Select the closest witnessed SNP quote set whose paid median issuer is recognised by a close-group majority of the selected peers. This keeps fallback quote candidates available without paying a median issuer that the PUT majority may reject.

SemVer: bug fix; no public ant-client API break expected.
Keep proof quote order stable while ordering PUT targets so the initial store wave favours peers that voted for the paid median issuer. Wire the in-process E2E protocol through AntProtocol::attach_p2p_node and use ant-node's test-only paid close-group override for the local client/storage-node topology.

SemVer: bug fix; no public ant-client API break expected.
…olicy

feat(client): apply witnessed quote policy locally
Re-point ant-protocol + ant-node (runtime optional + test-utils dev-dep)
from feat/witnessed-transcript-policy -> canonical rc-2026.6.2, refresh lock
to saorsa-core 0.26.0-rc.1 / ant-protocol 2.2.0-rc.1 / ant-node 0.12.1-rc.7.
Includes #119 (apply witnessed quote policy locally).
Store public upload DataMaps through the same file upload chunk set so wave and merkle payments cover the shareable DataMap address instead of paying for it in a second post-upload call.

SemVer: bug fix; no public ant-client API break expected.
Remove the [patch."…saorsa-core"] override that pointed at
mickvandijke/saorsa-core@feat/witnessed-view-count-rc-2026.6.2 (scaffold
for building against saorsa-core #135 before it merged). #135 is now on
canonical rc-2026.6.2, so the lock resolves saorsa-core there (79f5ad6).
Verified: cargo check --all-targets --all-features passes.
…nt-rc-2026.6.2

feat(client): widen SNP witnessed quote views
…atch-rc-2026.6.2

fix(client): include public DataMap in upload payment
Re-pin saorsa-core (79f5ad6, #135), ant-node (8f8842a, #146/#147), and
ant-protocol to their current rc-2026.6.2 commits so the lock references
match the branches. Lock-only; no version bump, no tag.
…ss-support

fix: use direct witness support for SNP median
…d-quotes-rc-2026.6.2

fix(client): fetch witnessed quotes concurrently
…witnessed-quotes-rc-2026.6.2"

This reverts commit 3279295, reversing
changes made to e8c7056.
Revert PR #125 (fetch witnessed quotes concurrently)
fix(snp): lower witness quorum for partial transcripts
Strip -rc and pin upstreams to crates.io: ant-protocol 2.2.0,
ant-node 0.13.0 (runtime optional + test-utils dev-dep), via the
re-exported saorsa-core 0.26.0. Hand-rolled (helper doesn't cover the
ant-node dev-deps).
@jacderida jacderida merged commit 4d04484 into main Jun 18, 2026
24 checks passed
@jacderida jacderida deleted the rc-2026.6.2 branch June 18, 2026 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants