Skip to content

Local network setup for indexing payments#69

Draft
RembrandtK wants to merge 30 commits intomainfrom
ip_local_network
Draft

Local network setup for indexing payments#69
RembrandtK wants to merge 30 commits intomainfrom
ip_local_network

Conversation

@RembrandtK
Copy link
Copy Markdown
Member

@RembrandtK RembrandtK commented Apr 14, 2026

Now partly merged to main and gip-88 branch being worked on as replacement.

Adds the deployment path for the GIP-0088 contract bundle (REO +
IssuanceAllocator + RecurringAgreementManager):

- New graph-contracts-issuance container for the Phase 4/5 deployment,
  wired after graph-contracts-horizon and running the issuance package
  deploy sequence (REO, IA, RAM, activation).
- Rename existing graph-contracts container to graph-contracts-horizon
  to distinguish it from the new issuance container. Dev-override files
  split correspondingly into graph-contracts-only.yaml and
  graph-contracts-issuance.yaml.
- Rename the Kafka topic from indexer_daily_metrics to
  eligibility_oracle_state to match the REO aggregator output name.
- Contract naming: the issuance deploy produces RewardsEligibilityOracleA
  (and B/Mock variants); consumers updated to read the A variant from
  issuance.json.
- Horizon compatibility: use getStake instead of hasStake in
  indexer-agent run.sh.
- Add KAFKA_TOPIC_ENVIRONMENT optional env var that all producers and
  consumers append to their topic names (e.g. gateway_queries_local).
  Leave empty for default topic names. All consumers must agree on the
  value; centralised in shared/lib.sh via kafka_topic() helper.
- Run redpanda as root so rpk topic bootstrap operations can write to
  the data directory without permission errors.
- Enable dipper's Redpanda signal consumer: kafka.brokers, topic, and
  consumer_group in generated config.json.
- Fix dipper config to resolve recurring_collector address from the
  horizon address book (moved to a different JSON file layout).
- Enable indexer-service DIPs gRPC server (listen on DIPS_PORT, expose
  to local-network consumers).
- Map chain ID 1337 to hardhat in dipper's additional_networks so the
  local hardhat chain is recognised.
- Remove docs/indexing-payments/RecurringCollectorDeployment.md —
  superseded by graph-contracts-issuance container deployment flow.
GHCR packages for dipper-service and subgraph-dips-indexer-selection are
not published, so point both versions at :local tags built from sibling
repos. Also enable the indexing-payments profile by default on this branch.
Switches four runtime services from clone-and-build wrappers
(FROM debian:bookworm-slim + ARG *_COMMIT + cargo build) to thin
image-consumption wrappers (FROM ghcr.io/...:${VERSION}). Each wrapper
now just adds the tools run.sh needs (jq, curl, rpk) and overrides
ENTRYPOINT with the local-network run.sh.

Conversions:

- eligibility-oracle-node → ghcr.io/edgeandnode/eligibility-oracle-node:main.
  Updates run.sh for the upstream config schema change
  ([[blockchain.contracts]]/[[blockchain.chains]] arrays, drop
  staleness_threshold_secs) and the contract rename
  (RewardsEligibilityOracle → RewardsEligibilityOracleA) across scripts
  and docs.
- gateway → ghcr.io/edgeandnode/graph-gateway:sha-50c7081 (pinned to
  upstream main HEAD; CI publishes sha-<short> tags only).
- tap-escrow-manager → ghcr.io/edgeandnode/tap-escrow-manager:sha-df659cf.
  Symlinks /opt/tap-escrow-manager to /usr/local/bin so run.sh can
  invoke the binary by name.
- graph-node bumped v0.37.0 → v0.42.1.
- indexer-tap-agent bumped v1.12.2 → v2.0.0.

Env var renames: *_COMMIT → *_VERSION for each converted dep.
Profile rename: rewards-eligibility → eligibility-oracle (service name
eligibility-oracle-node retained to keep the contract-vs-node
distinction visible). Env var ELIGIBILITY_ORACLE_VERSION renamed to
ELIGIBILITY_ORACLE_NODE_VERSION for the same reason.

Extend CONTRACTS_COMMIT from short sha to full 40-char sha.

Dev-override restructure: drop bundled graph-contracts.yaml (which
mixed contracts + subgraph concerns), rename
graph-contracts-only.yaml → graph-contracts-horizon.yaml, add new
network-subgraph.yaml for the subgraph-deploy override alone, rename
GRAPH_CONTRACTS_SOURCE_ROOT → NETWORK_SUBGRAPH_SOURCE_ROOT to match
what it actually points at. Add note to compose/dev/README.md that
image-tag consumption is preferred over these overrides, which are not
all recently tested.

COMPOSE_PROFILES default includes all four profiles; comment updated
to flag that indexing-payments requires GHCR auth.

Note: graphprotocol/rewards-eligibility-oracle is a *different*
Python-based project; the local-network dep is the Rust one at
edgeandnode/eligibility-oracle-node.
Sequential cast send calls with --confirmations=0 returned before the tx was
visible in chain state, so the next tx was built with a stale nonce and got
'nonce too low' from the chain. Default cast send behaviour waits for the tx
receipt, which serializes the approve/mint pairs correctly.

Cascading effect: when start-indexing died partway through the
approve+mint curation loop, allocations were never created, which starved
dipper's topology fetch (empty gateway API responses interpreted as 'failed
to fetch subgraphs info' and retried indefinitely).
…s CI

Match graphprotocol/contracts' own CI setup action
(.github/actions/setup/action.yml):

- node 22 (was 23)
- apt: libudev-dev libusb-1.0-0-dev (native deps of
  hardhat-secure-accounts/ledger toolchain)
- corepack enable only; pnpm version resolved per-directory from each
  project's packageManager field (pnpm 10.x for Horizon,
  pnpm 9.0.6 for the DataEdge snapshot — corepack downloads on demand)
- pnpm install --frozen-lockfile (was --ignore-scripts; the flag was a
  workaround for the missing libudev, not an intentional choice)
- yarn@1.22.22 prepared just-in-time for the TAP step, not globally

Verified by building the image and running horizon Phase 1-3 plus the
issuance container end-to-end against a fresh chain.
…4 services

Previously a single `graph-contracts-horizon` container ran three
independent deploys sequentially (horizon/subgraph-service, legacy TAP,
DataEdge) and a separate `graph-contracts-issuance` container duplicated
the full contracts clone+build. The split:

- `graph-contracts`           — Phase 1: horizon + subgraph-service
- `graph-contracts-issuance`  — GIP-0088 (REO + IA + RAM + activation)
- `graph-contracts-tap`       — legacy TAP contracts (separate repo)
- `graph-contracts-data-edge` — DataEdge (older pinned contracts snapshot)

All four services share a single multi-stage Dockerfile at
containers/core/graph-contracts. `base` and `contracts-src` stages are
shared: `contracts` and `issuance` both `FROM contracts-src`, so the
graphprotocol/contracts workspace is cloned, installed, and built exactly
once instead of twice. `tap` and `data-edge` share only `base` since they
use different repos/commits. Each compose service picks its stage via
`build.target`.

Runtime dependency graph:

    chain
      ├─► graph-contracts ─┬─► graph-contracts-issuance
      │                    └─► graph-contracts-tap
      └─► graph-contracts-data-edge

`graph-contracts` and `graph-contracts-data-edge` run in parallel; after
`graph-contracts` completes, `graph-contracts-issuance` and
`graph-contracts-tap` run in parallel. Previously all four deploys were
serialized inside one container.

Downstream `depends_on` updated per service:
- block-oracle       → graph-contracts + graph-contracts-data-edge
- indexer-agent      → graph-contracts + graph-contracts-tap
- subgraph-deploy    → graph-contracts + graph-contracts-tap + graph-contracts-data-edge
- tap-aggregator     → graph-contracts + graph-contracts-tap
- ready              → all four contract services

Services whose contract dependency flows transitively through
subgraph-deploy or indexer-agent (gateway, indexer-service, tap-agent,
tap-escrow-manager, etc.) needed no changes.

Also renames the dev overlay `compose/dev/graph-contracts-horizon.yaml`
to `graph-contracts.yaml` and updates references in `.env`,
`compose/dev/README.md`, and `graph-contracts-issuance.yaml`.

Verified end-to-end: all four contract services deploy cleanly against a
fresh chain in the expected parallel order, and subgraph-deploy +
indexer-agent + tap-aggregator all successfully read the produced
address books (horizon.json, subgraph-service.json, tap-contracts.json,
block-oracle.json, issuance.json) and start normally.
…-contracts

DataEdge was previously cloned from an older contracts commit
(bdc66135e7700e9a4dcd6a4beac585337fdb9c21) because that was the last
commit where packages/data-edge built under pnpm 9 + hardhat v2 + ethers v5
with the @tenderly/hardhat-tenderly plugin. Everything else in the repo
moved to pnpm 10 + ethers v6 and newer hardhat plugins, but
packages/data-edge has since been migrated upstream — it now builds
cleanly as part of the current CONTRACTS_COMMIT workspace, with no
Tenderly plugin (eliminating a noisy 500 error we were getting every
deploy).

The contract source (DataEdge.sol / EventfulDataEdge.sol) is essentially
identical across the two commits — only NatSpec comments differ — so
switching to the current commit deploys the same bytecode.

Consequences:
- `data-edge` stage dropped from the Dockerfile. No separate clone,
  no pnpm 9 corepack dance, no second contracts install.
- `graph-contracts-data-edge` compose service removed.
- `data-edge.run.sh` deleted; its logic moves into `contracts.run.sh`
  as a second phase that runs from /opt/contracts/packages/data-edge
  (already built by the shared `contracts-src` stage).
- `block-oracle.json` is now written by `graph-contracts` itself.
- Downstream `depends_on: graph-contracts-data-edge` references
  (block-oracle, subgraph-deploy, ready) replaced with the existing
  `graph-contracts` dependency — no new edges, just fewer.

Verified end-to-end: graph-contracts deploys Phase 1 + Phase 2 in
sequence, block-oracle.json is written with the DataEdge address, and
subgraph-deploy successfully consumes it to deploy the block-oracle
subgraph.

Net: 4 contract services → 3, one duplicate contracts clone eliminated,
Tenderly error noise gone.
… conflicts

Rootless Docker's RootlessKit port manager races on common ports (8081,
8082, 9092, 9644) during concurrent container startup. Move Redpanda
host-published ports to 18xxx/19xxx range and drop the internal Kafka
listener (9092) host mapping entirely — host access uses the EXTERNAL
listener on 29092.

Decouple REDPANDA_KAFKA_PORT from run.sh scripts: all container-to-
container Kafka connections now hardcode the internal port 9092 instead
of referencing an env var that was conflating host and internal ports.
Tests assumed only one active allocation per deployment, causing
"Already allocating to the subgraph deployment" errors when duplicates
existed. Now close all active allocations for the target deployment
before recreating.

Also batch block mining via anvil_mine(count, 12) instead of per-block
evm_increaseTime + evm_mine (2N → 1 RPC call per chunk), and reduce
unnecessary epoch advances (pre-existing allocations don't need 2 epoch
advances to close, and creating allocations needs no advance at all).
…e, OracleA)

Upstream contracts renamed getRewardsEligibilityOracle to
getProviderEligibilityOracle and the deployment key from
RewardsEligibilityOracle to RewardsEligibilityOracleA.
…rage

Tests need to pause/unpause the REO contract. Grant PAUSE_ROLE to
ACCOUNT0 during contract setup (via ACCOUNT1 which holds GOVERNOR_ROLE).
Replace the default serial group (all 34 tests sequential) with named
groups so non-conflicting tests run in parallel:

- serial(alloc): allocation/denial/rewards tests (16 tests)
- serial(reo): REO governance config tests (11 tests)
- serial(staking): stake/provision tests (3 tests)
- no serial: pure reads + reverts (14 tests)

The three serial groups run independently, so fast reo/staking tests
no longer wait behind slow epoch-advancing allocation tests.

Also make contract_not_paused self-healing: if a prior test left the
REO paused (e.g. pause_blocks_writes interrupted by --fail-fast), it
unpauses to recover rather than failing.
Anvil v1.0.0 (April 2025) prunes historical state aggressively despite
--preserve-historical-states / --slots-in-an-epoch / --transaction-block-keeper
flags — empirically only ~15 blocks retained vs ~10 without (per the
AnvilHistoricalStateRetention task). Graph-node hits BlockOutOfRangeError
on per-block eth_calls during test runs, kills its block stream with a
spurious 'possible reorg detected' loop, and never recovers.

Foundry shipped a state-retention fix between 1.0.0 and 1.5.0. Verified
2026-04-29 against ghcr.io/foundry-rs/foundry:stable (anvil 1.5.1):
eth_getBalance and eth_getCode succeed at all probed blocks 1..3000
after mining 3000 blocks, vs old anvil where only the head block is
queryable.

Bumps the four foundry pins consistently (chain runtime, indexer-agent /
start-indexing / graph-contracts cast tooling) and drops the now-vestigial
anvil flags from chain/run.sh — they were no-ops on v1.0.0 and aren't
needed on :stable.
…re-before-assert

Adds TestNetwork::ensure_active_allocation() that returns an active
allocation, creating one from a closed deployment if a prior test
panicked before restoring. Tests that previously started with
get_allocations + filter-for-active now fail gracefully when state is
dirty instead of cascading failures through the suite.

REO governance tests that toggle validation / eligibility-period /
oracle-timeout now restore state before asserting, so a failing
assertion no longer leaks state into the rest of the run.
Replaces the per-block evm_increaseTime + evm_mine pair with a single
anvil_mine call that advances 12s per block internally. Halves the RPC
round-trips and drops the per-chunk subgraph-catchup wait (not needed
once the chain retains historical state).
…istener

Deploys the graphprotocol/indexing-payments-subgraph alongside the other
protocol subgraphs, via multi-stage COPY from a per-branch image built
with `just build-image` in that repo's worktree (INDEXING_PAYMENTS_SUBGRAPH_VERSION).

Connects dipper's chain_listener to the deployed subgraph so agreements
transition from Created to AcceptedOnChain when indexers accept on-chain,
instead of expiring.

Adds indexer-agent -> subgraph-deploy compose dependency so the agent
observes the indexing-payments deployment at startup and marks it as
offchain via INDEXER_AGENT_OFFCHAIN_SUBGRAPHS. Without this the reconciler
pauses the subgraph (no allocation, no rule) and chain_listener stalls.

Also exports INDEXING_PAYMENTS_SUBGRAPH_ENDPOINT to indexer-agent so its
unconditional indexingPaymentsSubgraph SubgraphClient construction has
an endpoint to read. Without it, the client throws 'Cannot read
properties of undefined (reading status)' on startup before the
management API comes up.
Extend the helper to query the network subgraph for a signalled deployment
when the management API has no allocations at all (closed or active).
Replace inline active-allocation lookups in close_allocation_collects_rewards
and the poi_normal_claim restore step with ensure_active_allocation calls.

Preserve the close-all-active-allocs loop in close_allocation_collects_rewards
(matching close_and_recreate_allocation): indexer-agent may auto-create
extra allocations on the same deployment, so closing only the one returned
by ensure_active_allocation would leave a stale active alloc that breaks
the subsequent create_allocation with "Already allocating".
The audit-fix-2 REO has no whenNotPaused guards, so setEligibilityValidation
and renewIndexerEligibility succeed while paused. Update pause_blocks_writes
to verify both writes complete (not revert) during pause and after unpause.
…subgraph}, dipper

PRs landed 2026-04-30 (indexer#1209, indexer-rs#1028,
indexing-payments-subgraph#8) added workflow_dispatch to the publish
workflows, enabling :sha-<short> tags for the DIPs integration branches.
Switch INDEXER_AGENT_VERSION, INDEXER_SERVICE_RS_VERSION,
INDEXER_TAP_AGENT_VERSION, INDEXING_PAYMENTS_SUBGRAPH_VERSION,
DIPPER_VERSION from `local` to those published shas, removing the need
for parallel `just build-image` workflows in source-clone worktrees.

scripts/deps.sh (the source-clone status/pull/build helper) is no
longer needed; moved out of the repo to ../deps.sh during this turn.
IISA changes are now merged to main and published as v2.3.0. Drop the
local-build requirement and consume the released image instead.
graphprotocol/contracts pinned engines.node ^24 in d29ea286e (.nvmrc +
package.json engines field). pnpm install --frozen-lockfile against any
post-d29ea286e CONTRACTS_COMMIT now refuses node 22 with
ERR_PNPM_UNSUPPORTED_ENGINE. Bump the base stage to node:24-bookworm-slim
to keep contracts-src builds working.
Picks up the SS-side localNetwork governor fix (7453b59b8) that aligns
DisputeManager / SubgraphService ProxyAdmin ownership with ACCOUNT1, the
account issuance.run.sh signs upgrade txs with. Without this, the
GIP-0088 upgrade phase reverted with OwnableUnauthorizedAccount mid-batch.

Also includes the migrate-config governor bumps (2c07eed7f horizon,
3117e9433 SS) which are not load-bearing for local-network but keep the
sibling configs consistent with the m.getAccount(1) convention.

Drop the over-specific reo-deployment-3 comment in favour of a generic note.

Stack verified: docker compose down -v && up -d completes cleanly,
graph-contracts-issuance runs through all four GIP-0088 phases (deploy,
configure, transfer, upgrade) with 44 contracts synced.
The indexer-agent's auto-reconciler maintains an allocation per
discovered subgraph deployment. Convenient for human use of
local-network, but the integration tests close+recreate allocations
explicitly and race the reconciler — the agent recreates an allocation
between a test's close and create, and the test fails with `Already
allocating to the subgraph deployment`.

Activate this override for test runs to keep the agent in manual mode:

  COMPOSE_FILE=docker-compose.yaml:compose/dev/manual-allocation.yaml \\
    docker compose up -d

Verified locally: removes 2 of 4 cluster A failures from the test
suite; baseline 38/6 → 39/5 (only `close_and_recreate_allocation` and
`poi_allocation_too_young` still trip on auto-allocator state).
Root justfile wraps the high-traffic ops (up/down/logs, restart, reset,
connect, mine, advance-epoch, test). tests/justfile default switched
from running tests to listing recipes.
…iles

Versions are always supplied via compose build args from .env;
adding a 'latest' default would mask misconfiguration.
…ring

Bumps gateway to main's pin, which includes #1179 removing horizon
transition code and tap v1 compat. Drops the now-unused
legacy_dispute_manager / legacy_verifier from gateway config and the
matching legacy address stubs from contracts.run.sh. Drops
receipts_verifier_address (V1, deprecated/ignored by indexer-rs since
#929) from indexer-service and tap-agent configs.

[horizon] enabled = true blocks remain — still required at the pinned
indexer-rs sha-853f303 (validation drops in upstream #1014, not yet in
the DIPs branch).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant