Feat/hot path optimization with external rpc validation#508
Open
oten91 wants to merge 59 commits intofeat/hot-path-optimizationfrom
Open
Feat/hot path optimization with external rpc validation#508oten91 wants to merge 59 commits intofeat/hot-path-optimizationfrom
oten91 wants to merge 59 commits intofeat/hot-path-optimizationfrom
Conversation
When all suppliers in a session are behind the real chain tip, internal consensus considers everything "in sync" because it only compares endpoints against each other. Add optional per-service external RPC endpoints that periodically fetch the real block height as a floor for perceivedBlockNumber, correctly filtering stale suppliers. Supports multiple sources per service for redundancy (max wins). Handles EVM hex, Cosmos decimal, and Solana numeric response formats. Failure-safe: unreachable sources log a warning and are skipped. External heights are written to Redis for cross-replica benefit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
113320b to
ff9606e
Compare
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ff9606e to
e932bd2
Compare
jorgecuesta
approved these changes
Feb 18, 2026
Contributor
jorgecuesta
left a comment
There was a problem hiding this comment.
LGTM, just a minor question about it assuming all the endpoints are JSON rpc compatible and a possible optimization.
d1bf40c to
1cb8631
Compare
… mode to external block fetcher
- Replace docker/cli exclude with direct require of v29.2.1 which uses
moby/moby packages with proper Go modules, fixing the range-over-function
compilation error in E2E tests.
- Pin docker/docker to v28.5.1 and dockertest to v3.11.0 to fix broken pipe
error when streaming Docker build context via the API in E2E tests.
- Add type field ("jsonrpc" default or "rest") to ExternalBlockSource config
so Cosmos REST endpoints (GET /status) work alongside JSON-RPC POST.
- Expand .dockerignore to exclude non-essential directories (docusaurus, docs,
research, etc.) reducing build context size.
- Add integration tests against real public RPC endpoints (ETH, Solana,
Osmosis JSON-RPC, Osmosis REST) to validate all chain type parsing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1cb8631 to
25be89b
Compare
Prevents cold-start filtering where the external block source reports the real chain tip before suppliers have reported their heights, causing all endpoints to appear "behind" and get filtered out. Default grace period: 60s. Configurable per-service via `grace_period` in `external_block_sources` config. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds ConsumeExternalBlockHeight to Cosmos and Solana QoS instances so external_block_sources works for all chain types, not just EVM. Also fixes EVM: skip Redis writes during grace period to prevent other replicas from picking up the external height before suppliers report. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Docker image build + container startup can take 5+ minutes in CI, which was consuming the entire 5-minute context timeout before tests even started. This caused all Vegeta test methods to immediately report "canceled" with 0 requests. Move context creation to after getGatewayURLForTestMode() returns, so the 5-minute timeout only governs actual test execution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The CI workflow already builds the Docker image in Phase 1 and passes it as an artifact to Phase 2. But force_rebuild_image was still true in the default config, causing a redundant ~5 minute rebuild from scratch (with NoCache) before every E2E test run. Set force_rebuild_image=false in the CI config script so the test reuses the pre-built image from Phase 1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All 5 matrix jobs (3 HTTP + 2 WebSocket) were downloading the Docker image artifact simultaneously, triggering GitHub's secondary rate limit (403 Forbidden). Stagger downloads at 0/5/10/15/20s intervals. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…o QoS instances QoS instances are created at startup before external health check rules are loaded, so syncAllowance starts at 0 (disabled). This means stale endpoints were never filtered even when external rules specified a sync_allowance. Fix by using atomic.Uint64 for syncAllowance in both EVM and Cosmos QoS configs, and propagating the value from the health check executor when external rules are loaded/refreshed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…eneric QoS Services using generic/NoOp QoS (near, sui, tron) previously had no block height tracking, meaning stale endpoints were never filtered even when external block sources were configured. This converts NoOpQoS from a stateless value type to a stateful pointer type with per-endpoint block height tracking, sync-allowance-based filtering, and external block height consumption — matching the pattern used by Cosmos and Solana QoS. Default behavior is unchanged: when syncAllowance=0 (default), endpoint selection remains purely random. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
In multi-replica deployments, each replica updates perceivedBlockHeight locally only. This means Replica A may see block 1000 while Replica B stays at 950, causing B to select stale endpoints. EVM already had full Redis sync; this extends the same pattern to the remaining QoS types. Adds SetReputationService, StartBackgroundSync (periodic Redis reads), and async Redis writes in UpdateFromExtractedData and ConsumeExternalBlockHeight for Solana, Cosmos, and NoOp QoS. Uses simple max semantics (if Redis > local, update) rather than EVM's consensus mechanism. No changes to cmd/main.go needed — existing duck-typed interface checks auto-detect the new methods. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…onfigured NewSimpleQoSInstance (used for all EVM services) sets getEVMChainID() to "". However, isChainIDValid() still ran unconditionally, rejecting every observed endpoint — either because it had no chain ID observation yet (errNoChainIDObs) or because the observed chain ID (e.g. "0x38") didn't match "". This caused a flood of warn logs and forced fallback to random endpoint selection. Skip chain ID validation and synthetic eth_chainId checks entirely when no expected chain ID is configured, consistent with the constructor's intent to delegate validation to active health checks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…a stale supplier filtering Health checks only run on the leader pod, so non-leader replicas had empty per-endpoint block heights and could not filter stale suppliers. This adds Redis HSET/HGETALL sync for per-endpoint block heights across all 4 QoS types (EVM, Solana, Cosmos, NoOp), enabling all replicas to filter stale endpoints. Write side: async HSET from UpdateFromExtractedData after each health check. Read side: HGETALL on the existing 5s background sync ticker, updating only existing local endpoint entries where Redis has a higher block height. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests cover both write side (UpdateFromExtractedData writes per-endpoint block heights to Redis) and read side (StartBackgroundSync reads from Redis and updates local endpoint store) for all 4 QoS types: EVM, Solana, Cosmos, and NoOp. Verifies that: - Higher Redis values update local endpoints - Lower Redis values do NOT downgrade local endpoints - Non-existent local endpoints are NOT created from Redis data - Periodic sync picks up new Redis data after startup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Non-leader replicas had empty endpoint stores at startup because health checks only run on the leader pod. The syncEndpointBlocksFromRedis closure would fetch per-endpoint block heights from Redis but silently skip all entries since none existed locally. This meant stale suppliers were never filtered on non-leader pods. Now, EVM/Cosmos/NoOp QoS types create new endpoint entries from Redis data when the endpoint doesn't exist in the local store, enabling block height filtering from the first request. Solana is excluded because its validateBasic() requires health+epoch data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tractBlockHeight Extends extractBlockHeight to handle non-EVM response formats for external block sources on generic chains (near, sui, tron): - Decimal strings without 0x prefix parsed as base-10 (Sui) - Numeric latest_block_height in sync_info objects (NEAR) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e period Add mdbx_panic to the heuristic error indicator patterns so that Erigon nodes with corrupted/full MDBX databases trigger retries on a different supplier instead of returning the error directly to users. Also reduce the external block grace period from 60s to 30s to shorten the cold start window where stale suppliers can leak through. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… comet_bft unsupported CometBFT methods (status, block, validators, etc.) sent via JSON-RPC POST were rejected with "unsupported RPC type" when suppliers hadn't staked comet_bft endpoints. Since CometBFT nodes handle both REST and JSON-RPC interfaces on the same endpoint, these requests can safely route to json_rpc endpoints instead. This fallback is a no-op when comet_bft is properly staked. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ng UNKNOWN_RPC NoOp QoS was discarding the RPC type detected by the gateway and always setting UNKNOWN_RPC on service payloads. This caused all generic services (tron, sui, near) to report unknown_rpc in metrics/observations. Now the detected type (json_rpc, rest, etc.) flows through to the payload. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Endpoint block height entries accumulate indefinitely as sessions rotate, causing non-leader replicas to recreate stale entries from Redis on every sync cycle. This adds a lastSeen timestamp to each endpoint entry, updated during filtering when endpoints appear in active sessions. A periodic sweep (every 5s tick) removes entries older than 2h TTL from both local stores and Redis via HDEL. Changes across all QoS types (EVM, Cosmos, Solana, NoOp): - Add lastSeen field to endpoint structs - Add touchEndpoints() to update lastSeen during filtering (separate WLock) - Add sweepStaleEndpoints() to remove entries past TTL - Call sweep in StartBackgroundSync tick loop - Add RemoveEndpointBlockHeights to ReputationService/Storage interfaces - Implement HDEL-based cleanup in Redis storage, delete in memory storage - Update all 6 test mocks with the new interface method Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ove them Entries created by syncEndpointBlocksFromRedis had lastSeen=zero, which the sweep intentionally skips. This meant Redis-synced entries were never cleaned up. Set lastSeen=time.Now() on new entries so they get swept after the 2h TTL if they never appear in an active session. Also fix NoOp UpdateFromExtractedData to preserve existing lastSeen instead of overwriting the entire endpointState struct. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Broken/lazy suppliers return empty responses like "result":[],
"result":{}, or bare {}. This adds two layers of defense:
Layer 1 — Heuristic detection (retry on different supplier):
- REST bare {} → rest_empty_object (confidence 0.80, retry)
- JSON-RPC "result":{} → confidence 0.95, MajorErrorSignal (never valid
for any JSON-RPC method)
- JSON-RPC "result":[] → confidence 0.75, MinorErrorSignal (valid for
some methods like eth_getLogs, retry is benign)
Layer 2 — Block height filter (prevent future requests):
- Add ErrInvalidBlockHeightResult sentinel error for extractors to
signal "response had a result but it was unparseable"
- EVM extractor: returns sentinel when eth_blockNumber gets non-string
result ([], {}, number, boolean) or unparseable hex
- Cosmos extractor: returns sentinel when CometBFT JSON-RPC response
has empty container result ({} or [])
- UpdateFromExtractedData (EVM/Cosmos/Solana): stores block height 0
when InvalidBlockHeight flag is set, causing the endpoint to be
filtered (0 < perceived - sync_allowance)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…infrastructure When a relay fails, retries now exclude ALL endpoints behind the same domain (host), not just the exact endpoint address. This prevents wasting retry budget on different supplier endpoints that share the same broken backend (e.g., multiple suppliers behind rel.spacebelt.xyz all returning empty responses). Also stops retrying entirely when all available endpoints are from tried domains, instead of resetting and cycling through the same broken infrastructure with backoff. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…alid responses Solana's GetServicePayloads() was not setting JSONRPCMethod on the protocol payload, so the heuristic analyzer received an empty method and could not perform method-aware empty array detection. This caused "result":[] from broken suppliers to be classified as jsonrpc_success instead of jsonrpc_invalid_empty_array for scalar methods like getSlot. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…reset A single success was resetting CriticalStrikes to 0, allowing deceptive suppliers to avoid the cooldown system entirely by interleaving fake successes with critical errors. Now strikes decay by 1 per success, requiring sustained good behavior to fully recover. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WebSocket handler was missing RPC type validation, allowing connections for services that don't have "websocket" in their configured rpc_types. This let suppliers pump unlimited success signals via WebSocket for services like Solana (json_rpc only), bypassing reputation penalties. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
RPCTypeValidator was defined on the Gateway struct and checked in both HTTP and WebSocket handlers, but never instantiated — always nil. This meant RPC type validation was silently skipped for all requests, allowing WebSocket connections to services like Solana that only support json_rpc. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
At 50+ critical strikes, `1 << exponent` overflows int64, producing zero or negative cooldown durations. This caused suppliers with 100+ strikes to have instantly-expiring cooldowns instead of the intended 1-hour max. Cap exponent at 7 (5min * 2^7 = 640min > 60min max). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…to all protocols
Some suppliers return a static {"jsonrpc":"2.0","id":1,"result":[]} for
ALL requests regardless of protocol type, including REST and CometBFT
paths. This evades current detection because:
- REST analyzeREST() only looks for error indicators
- CometBFT "result":[] check requires jsonrpcMethod which is empty for GET
- Cosmos health checks lack expected_response_contains validation
Changes:
- Add REST protocol mismatch detection (JSON-RPC response to REST request)
- Add CometBFT empty array detection (no CometBFT method returns arrays)
- Register both as deceptive patterns for CriticalErrorSignal + strikes
- Set JSONRPCMethod on Cosmos JSON-RPC payloads (aligns with EVM/Solana)
- Add expected_response_contains to all 15 Cosmos chain health checks
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sei is a dual chain (Cosmos + EVM) but most suppliers only run EVM nodes. The CometBFT health checks (health, status, syncing) were penalizing EVM-only suppliers that correctly reject unknown methods like "status" with "method does not exist". Switch to eth_blockNumber and eth_chainId which all sei suppliers support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Spacebelt returns {"result":[]} to eth_blockNumber which passes basic
HTTP 200 check. Adding expected_response_contains: '"0x' catches this
since real EVM responses always contain hex strings.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Some suppliers return HTTP 200 with
{"error":{"code":-32000,"message":"Failed to call fallback API"}} when
they can't serve archival requests. This was leaking through to clients
as a 200 while the same class of error ("missing trie node") from other
suppliers was correctly caught and retried.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Suppliers returning fast canned responses like {"result":[]} pass
transport-level checks (no error, has bytes), win the hedge race, and
cause the correct response from another supplier to be discarded. The
heuristic later catches the bad response but the good one is gone.
Now the hedge layer runs the same heuristic analysis before declaring
a winner. If the first response fails heuristic validation, it waits
for the second response instead of immediately returning the bad one.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… major The syncing REST health check had reputation_signal: critical_error, causing CriticalErrorSignal (strikes + exponential cooldown) on any health check failure. This was unfairly penalizing legitimate suppliers whose REST endpoint occasionally failed. Changed to major_error for all 14 Cosmos chains. The status JSON-RPC check retains critical_error since its expected_response_contains targets canned responses specifically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…height The session cache key previously included the current block height, which changes every block (~60s). Since sessions span ~50 blocks, this caused a cache miss on every new block for the same session, making the cache ineffective and overloading full nodes with redundant GetSession gRPC calls. The fix computes sessionStartHeight deterministically from the current height and NumBlocksPerSession (already cached via GetSharedParams), so the cache key remains stable for the entire session window. Falls back to current height if SharedParams is unavailable, preserving previous behavior rather than failing requests. Closes #509
The original formula `h - (h % N)` is 0-based, but Shannon sessions start at block 1: [1,60], [61,120], etc. This caused the last block of each session to get a different cache key, resulting in 2 fetches per session instead of 1. Use Shannon's formula `h - ((h-1) % N)` to align cache keys with actual session boundaries. Updated tests to use NumBlocksPerSession=60 matching production config. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…kout Archival requests (eth_call at old blocks, etc.) hitting non-archival suppliers return "historical state is not available" or similar pruning errors. The heuristic correctly retries these on different suppliers, but the circuit breaker was also marking the entire domain as broken. This caused a death spiral: archival request volume continuously fed the circuit breaker, escalating hit counts into the thousands and locking ALL non-qspider domains out for 30 minutes at a time. On mainnet, arb-one had hit counts of 151,549 on a single domain. Changes: - Add shouldCircuitBreak() to gate circuit breaking on heuristic type - Archival/pruning errors (historical state, state pruned, missing trie node, etc.) skip circuit breaking but still retry on other suppliers - Add IsArchivalRelatedError() to heuristic package for pattern matching - Store reason in Redis (format: "expiry:hitCount:reason") for debugging without needing log access. Backward compatible with old formats. - Add MatchedPattern field to AnalysisResult for callers to inspect - Add diagnostic logging: circuit break events (error level) and skipped circuit breaks (warn level) with reason, domain, and service context - Add logger parameter to DomainCircuitBreaker for structured logging Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The hedge race failure branch was not setting lastCircuitBreakReason or lastResponseSnippet, causing all circuit breaker logs and Redis entries to show empty reason/response fields. This made it impossible to debug why domains were being locked out via the hedge path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CometBFT requests sent as JSON-RPC POST (e.g., {"jsonrpc":"2.0","method":"status"})
were being routed to the REST validator, which re-detected the RPC type from the URL
path as REST. This caused rest_protocol_mismatch false positives at 0.95 confidence,
triggering circuit breaker domain lockouts across Cosmos chains.
Route these requests to the JSONRPC validator instead, which correctly preserves
RPCType=COMET_BFT and avoids the heuristic false positive.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…onses
Tron's HTTP API returns {} for query endpoints when the entity doesn't
exist (e.g., /wallet/getaccount for unactivated accounts). The heuristic
was flagging these as rest_empty_object errors at 0.80 confidence,
triggering circuit breaker domain lockouts and causing ~88% 5xx on tron.
Add path-based whitelist (restEmptyObjectValidPathPrefixes) for /wallet/
and /walletsolidity/ prefixes, matching the existing emptyArrayValidMethods
pattern for JSON-RPC. Pass request path through to REST analyzer via the
existing jsonrpcMethod parameter (empty for REST, now falls back to
payload.Path).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… JSON-RPC processOfElimination auto-assigned JSON_RPC to any HTTP request when it was the only HTTP-based RPC type configured. GET requests to json_rpc-only services (e.g., GET /status) were forwarded as JSON-RPC, causing backend 400 rejections and circuit breaker lockouts. Now validates that requests must be POST to be typed as JSON_RPC. Non-POST requests fall through to inspectPayload, which returns a proper error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add POST /admin/circuit-breaker/clear/{serviceId} endpoint that clears
both in-memory and Redis circuit breaker state. Redis DEL alone is
insufficient because refreshFromRedis merges local entries back —
this endpoint clears local cache first, then Redis.
Also updates pnf_path_rules.yaml sync_allowance values to match
external configuration (eth: 2, poly/arb-one/base: 20).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…are registered InitExternalConfig runs before SetQoSInstances, so the initial SetSyncAllowance calls in refreshExternalConfig find an empty qosInstances map and silently fail. This leaves sync_allowance=0 (disabled) for all services until the first periodic refresh (5min), creating a window where block height filtering is completely bypassed. Fix: SetQoSInstances now re-applies sync_allowance from any already-loaded external configs when QoS instances are registered. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
HTTP 4xx responses indicate the client sent a bad request, not that the domain is broken. Don't punish domains for correctly rejecting malformed requests (e.g., NEAR backends returning PARSE_ERROR for non-JSON-RPC payloads). Only 5xx, transport failures, and heuristic- detected supplier errors should trigger circuit breaks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Log HTTP method, URL path, content-type, user-agent, and first 512 bytes of request body at error level when RPC type detection fails. Helps diagnose sources of bad traffic (e.g., malformed NEAR POSTs). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Shannon protocol layer's heuristic check only used payload.JSONRPCMethod,
which is empty for REST requests. This meant path-aware validation (e.g., the
Tron /wallet/ whitelist for valid {} responses) never received the request path,
causing false positive rest_empty_object detections and circuit breaker lockouts.
Add the same payload.Path fallback that gateway-level checks already have.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… breaker reason - Promote protocol-layer heuristic detection log from Debug to Error for production visibility (needed to identify which JSON-RPC method triggers false positive empty array detection on sei) - Include jsonrpc_method and request_path in the heuristic error log - Append (method=...) to the error string propagated to circuit breaker - Increase Redis circuit breaker reason truncation from 200 to 500 chars so method names aren't cut off in diagnostic queries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ethod
Two false positive sources found via improved diagnostics:
1. Cosmos REST /cosmos/* paths return {} for non-existent entities (e.g.,
/cosmos/slashing/v1beta1/signing_infos/{addr} for validators with no
slashing record). Add /cosmos/ to restEmptyObjectValidPathPrefixes.
2. Sei exposes EVM methods with sei_ prefix (sei_getLogs, sei_getFilterLogs,
etc.) that behave identically to eth_* equivalents. Add to
emptyArrayValidMethods whitelist so "result":[] is not flagged.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…kout via hedge path When the protocol layer detects an archival error (e.g., "historical state not available", "missing trie node") and returns it as an error, the hedge_failed code path loses the structured heuristicResult. shouldCircuitBreak then sees heuristicResult=nil and treats it as a transport failure → circuit breaks the domain → locks out ALL traffic including non-archival requests. Fix: add error string parameter to shouldCircuitBreak. When heuristicResult is nil but an error is available, fall back to substring matching against archival patterns. This prevents domains from being locked out for serving pruned-state errors while remaining healthy for current-block requests. Affected chains: arb-one (118K hits easy2stake), base (45K hits nodefleet), bsc, and other EVM chains with archival traffic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… lockout Lite fullnodes (e.g., Tron) return plain text like "this API is closed because this node is a lite fullnode" for unsupported endpoints. Previously this was classified as generic non_json_response, triggering domain-level circuit breaking even though the domain works fine for other requests. Now the structural analyzer detects capability limitation patterns in non-JSON responses and returns a specific reason (non_json_capability_limitation) that shouldCircuitBreak recognizes as non-circuit-breakable — same treatment as archival errors. The request still retries on a different supplier. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When shouldCircuitBreak returns false in the batch processing path, the else branch accessed checkResult.HeuristicResult.Reason without a nil check. Moved heuristicReason extraction before the if/else so it's available in both branches with proper nil safety. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CometBFT endpoints always return JSON-RPC formatted responses (e.g.,
{"jsonrpc":"2.0","id":-1,"result":{}}) even for GET requests. This was
causing two types of false positives:
1. analyzeREST flagged CometBFT GET responses as rest_protocol_mismatch
because it saw "jsonrpc" in a REST response — but CometBFT paths like
/health, /status, /block always return JSON-RPC format by design.
2. analyzeJSONRPC flagged {"result":{}} as jsonrpc_empty_object_result
when rpcType arrived as JSON_RPC instead of COMET_BFT (due to
rpc_type_fallbacks routing CometBFT through JSON_RPC endpoints).
Fix: Add method-aware and path-aware CometBFT detection so the heuristic
recognizes valid CometBFT responses regardless of how they were routed.
These checks become redundant safety nets once suppliers can stake
comet_bft endpoints directly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
updating to go 1.25
adding optional external rpc validation