Workflow fix by shagun-singh-inkeep · Pull Request #2685 · inkeep/agents

shagun-singh-inkeep · 2026-03-13T15:35:06Z

No description provided.

vercel · 2026-03-13T15:35:11Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agents-api	Ready	Preview, Comment	Mar 13, 2026 9:39pm
agents-docs	Error		Mar 13, 2026 9:39pm
agents-manage-ui	Ready	Preview, Comment	Mar 13, 2026 9:39pm

changeset-bot · 2026-03-13T15:36:04Z

🦋 Changeset detected

Latest commit: e1d47f2

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 10 packages

Name	Type
@inkeep/agents-core	Patch
@inkeep/agents-api	Patch
@inkeep/agents-manage-ui	Patch
@inkeep/agents-cli	Patch
@inkeep/agents-sdk	Patch
@inkeep/agents-work-apps	Patch
@inkeep/ai-sdk-provider	Patch
@inkeep/create-agents	Patch
@inkeep/agents-email	Patch
@inkeep/agents-mcp	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

changeset-bot · 2026-03-13T15:36:47Z

⚠️ No Changeset found

Latest commit: 8a88bd2

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

pullfrog · 2026-03-13T15:37:25Z

TL;DR — Replaces the per-trigger daisy-chaining workflow architecture with a single centralized scheduler workflow that polls a new trigger_schedules runtime table every 60 seconds and dispatches one-shot workflows for each due trigger. This eliminates the complex adoption/supersession logic, removes the scheduled_workflows manage-DB dependency, and adds a post-deploy CI hook to restart the scheduler on the latest Vercel deployment.

Key changes

schedulerWorkflow + SchedulerService — New singleton long-lived workflow that ticks every 60 s, checks if it's still the active scheduler via scheduler_state, and dispatches due triggers.
triggerDispatcher — New dispatch layer: finds due rows in trigger_schedules, claims them with optimistic locking, starts one-shot scheduledTriggerRunnerWorkflow instances, and rolls back on failure.
trigger_schedules + scheduler_state tables — Two new runtime-DB tables (migration 0023) with a partial index for efficient dispatch queries.
ScheduledTriggerService rewrite — Lifecycle hooks (onTriggerCreated/Updated/Deleted) now upsert/delete rows in trigger_schedules instead of managing per-trigger workflow runs via DoltgreSQL.
scheduledTriggerRunnerWorkflow simplification — Converted from a daisy-chaining loop (sleep → execute → chain next) to a one-shot workflow (create invocation → execute with retries → done).
/api/deploy/restart-scheduler route — New deploy hook endpoint authenticated via INKEEP_AGENTS_RUN_API_BYPASS_SECRET, called by CI after Vercel promotion.
vercel-production.yml — New restart-scheduler job that curls the deploy hook after promotion.
computeNextRunAt extraction — Cron/one-time next-run logic extracted into a shared pure function used by both the service layer and the dispatcher.
Reconciliation handler simplification — scheduled_triggers check handler now returns empty results since per-trigger workflow tracking is removed.

_{Summary ｜ 23 files ｜ 2 commits ｜ base: main ← workflow-fix-reset}

Centralized `schedulerWorkflow` replaces per-trigger daisy-chaining

Before: Each enabled trigger spawned its own long-lived workflow that slept until the next execution, ran the agent, then daisy-chained a fresh workflow for the next iteration — with complex adoption and supersession logic in checkTriggerEnabledStep.
After: A single schedulerWorkflow runs in a loop (60 s ticks), querying trigger_schedules for due rows. Each trigger execution is a stateless one-shot workflow dispatched by triggerDispatcher.

The scheduler registers itself in scheduler_state (singleton row) and self-terminates if a newer run supersedes it. The dispatcher uses optimistic claim-based locking (claimedAt column) to prevent double-dispatch in multi-instance scenarios, with rollback on workflow start failure.

How does supersession work on deploy?

On Vercel, a new deployment triggers the restart-scheduler CI job, which POSTs to /api/deploy/restart-scheduler. This calls startSchedulerWorkflow(), which inserts a new currentRunId into scheduler_state. The old scheduler detects it's no longer current on its next tick via checkSchedulerCurrentStep and exits gracefully. On postgres/local worlds, the scheduler starts on server boot via recoverOrphanedWorkflows flow in index.ts.

schedulerWorkflow.ts · SchedulerService.ts · schedulerSteps.ts

`triggerDispatcher` — claim-based dispatch with rollback

Before: Trigger execution was tightly coupled to the per-trigger workflow lifecycle (sleep → wake → execute → chain).
After: dispatchDueTriggers() queries trigger_schedules for enabled rows with next_run_at <= now and claimed_at IS NULL, claims each row, advances next_run_at (or disables for one-time triggers), starts a one-shot workflow, and releases the claim. On workflow start failure, the schedule is rolled back.

triggerDispatcher.ts · computeNextRunAt.ts

`trigger_schedules` + `scheduler_state` runtime tables

Before: Workflow state lived in the manage DB's scheduled_workflows table (DoltgreSQL, branch-scoped).
After: Two new runtime-DB tables: trigger_schedules (composite PK on tenant_id + scheduled_trigger_id, partial index on next_run_at for dispatch) and scheduler_state (singleton row tracking the active scheduler run).

Table	Purpose	Key columns
`trigger_schedules`	Materialized view of enabled triggers for polling	`next_run_at`, `claimed_at`, `enabled`
`scheduler_state`	Tracks active scheduler workflow	`current_run_id`, `deployment_id`

runtime-schema.ts · 0023_amazing_romulus.sql · triggerSchedules.ts · schedulerState.ts

`ScheduledTriggerService` rewrite — sync to runtime table

Before: onTriggerCreated/Updated/Deleted resolved DoltgreSQL refs, managed scheduled_workflows records, and started/stopped per-trigger workflow runs.
After: These hooks call upsertTriggerSchedule / updateTriggerScheduleEnabled / deleteTriggerSchedule on the runtime DB. The centralized scheduler picks up changes on its next tick.

ScheduledTriggerService.ts · scheduled-triggers.ts

`scheduledTriggerRunnerWorkflow` — one-shot execution

Before: The runner workflow contained sleep logic, daisy-chaining (startNextIterationStep), pre/post-sleep enabled checks, pending invocation lookup, and parent adoption.
After: A stateless one-shot workflow: verify trigger enabled → create invocation (idempotent) → execute with retries → mark completed/failed.

scheduledTriggerRunner.ts · scheduledTriggerSteps.ts

Deploy hook: `/api/deploy/restart-scheduler`

Before: No mechanism to move the scheduler onto a new Vercel deployment.
After: A POST /api/deploy/restart-scheduler endpoint (authenticated via INKEEP_AGENTS_RUN_API_BYPASS_SECRET) starts a new scheduler workflow. The vercel-production.yml CI pipeline calls it after promotion.

restartScheduler.ts · vercel-production.yml · index.ts

^{｜ View workflow run ｜ Using Claude Code ｜ Triggered by Pullfrog ｜ pullfrog.com ｜ 𝕏}

pullfrog

Solid architectural improvement — moving from per-trigger daisy-chaining to a centralized scheduler with a trigger_schedules table is much simpler to reason about and operate. The claim/advance/rollback pattern is well thought out.

A few issues to address before merge, ordered by severity:

Security bug: The restart endpoint is accessible when INKEEP_AGENTS_RUN_API_BYPASS_SECRET is unset (undefined !== undefined is false).
Correctness: lte vs eq in claim safety, missing cronTimezone in schedule-change detection, gutted reconciliation with no replacement.
Robustness: No error handling in the restart handler, no error handling around the dispatch tick in the scheduler loop, claim-then-release flow has a theoretical re-dispatch window for fast cron intervals.

Issues on lines outside the diff (cannot be commented inline):

agents-api/src/data-reconciliation/handlers/scheduled-triggers.ts:14-16 — scheduleChanged checks cronExpression and runAt but not cronTimezone. Changing the timezone of a cron expression changes the effective schedule (e.g. 0 9 * * * in UTC vs EST), but this code path won't cancel pending invocations or recompute nextRunAt.
agents-api/src/data-reconciliation/handlers/scheduled-triggers.ts:26-34 — The check function now returns all-empty arrays, effectively disabling data reconciliation for scheduled triggers. In the new architecture, a useful check would verify that every enabled trigger in the manage DB has a corresponding trigger_schedules row with a non-null nextRunAt, and that no orphaned rows exist. Consider adding a TODO or a basic cross-table consistency check.

^{｜ Fix all ➔ ｜ Fix 👍s ➔ ｜ View workflow run ｜ pullfrog.com ｜ 𝕏}

pullfrog · 2026-03-13T15:39:49Z

agents-api/src/routes/restartScheduler.ts

+    const authHeader = c.req.header('Authorization');
+    const token = authHeader?.replace('Bearer ', '');
+
+    if (token !== env.INKEEP_AGENTS_RUN_API_BYPASS_SECRET) {


Bug: undefined secret matches undefined token. INKEEP_AGENTS_RUN_API_BYPASS_SECRET is optional in env.ts. If unset, both env.INKEEP_AGENTS_RUN_API_BYPASS_SECRET and token are undefined — so undefined !== undefined is false, granting access to anyone. Add an early guard:

if (!env.INKEEP_AGENTS_RUN_API_BYPASS_SECRET) { return c.json({ error: 'Endpoint not configured' }, 503); }

pullfrog · 2026-03-13T15:39:49Z

agents-api/src/routes/restartScheduler.ts

+      return c.json({ error: 'Unauthorized' }, 401);
+    }
+
+    const result = await startSchedulerWorkflow();


If startSchedulerWorkflow() throws (e.g. DB connection failure), Hono's onError handler returns a generic 500, and the CI curl -sf will retry with no actionable info. Wrap in try-catch and return a structured error body so CI logs are useful for debugging.

pullfrog · 2026-03-13T15:39:49Z

agents-api/src/routes/restartScheduler.ts

+  }),
+  async (c) => {
+    const authHeader = c.req.header('Authorization');
+    const token = authHeader?.replace('Bearer ', '');


Nit: replace('Bearer ', '') doesn't validate the prefix — Basic xyz silently becomes Basic xyz. Use authHeader?.startsWith('Bearer ') ? authHeader.slice(7) : undefined for stricter parsing.

pullfrog · 2026-03-13T15:39:49Z

packages/agents-core/src/data-access/runtime/triggerSchedules.ts

+    expectedClaimedAt: string | null;
+  }): Promise<boolean> => {
+    const claimCondition = params.expectedClaimedAt
+      ? lte(triggerSchedules.claimedAt, params.expectedClaimedAt)


lte should be eq for claim safety. lte(claimedAt, expectedClaimedAt) succeeds if claimedAt is any value ≤ the expected timestamp — e.g., a stale claim from a previous cycle would still match. This defeats the optimistic-concurrency purpose. Use eq so only the exact expected state can be claimed.

pullfrog · 2026-03-13T15:39:49Z

packages/agents-core/src/data-access/runtime/triggerSchedules.ts

+    nextRunAt: string | null;
+    enabled?: boolean;
+  }): Promise<void> => {
+    const set: Record<string, unknown> = {


Nit: Record<string, unknown> bypasses Drizzle's column-name validation — typos in property names would be silently ignored. Consider building the object inline or typing as Partial<typeof triggerSchedules.$inferInsert>.

pullfrog · 2026-03-13T15:39:49Z

agents-api/src/domains/run/services/triggerDispatcher.ts

+    return 'skipped';
+  }
+
+  await releaseTriggerScheduleClaim(runDbClient)({


Releasing the claim after advancing nextRunAt creates a window where the row is eligible again. For most cron intervals this is fine because nextRunAt is in the future. But for fast intervals (every minute), the next nextRunAt could already be due by the time releaseTriggerScheduleClaim runs. Consider combining advance + release into a single atomic update, or omitting the release entirely (the advance already moves the schedule forward).

pullfrog · 2026-03-13T15:39:49Z

agents-api/src/domains/run/services/triggerDispatcher.ts

+      tenantId,
+      scheduledTriggerId,
+      nextRunAt: schedule.nextRunAt,
+      enabled: isOneTime ? true : schedule.enabled,


For one-time triggers, rollback restores enabled: true. If the workflow engine stays down, the next tick (60s later) will re-dispatch indefinitely. Consider adding a failedAttempts counter or max-retry cap to prevent infinite re-dispatch of one-time triggers.

pullfrog · 2026-03-13T15:39:49Z

agents-api/src/domains/run/workflow/functions/schedulerWorkflow.ts

+      return { status: 'superseded', runId: myRunId };
+    }
+
+    await dispatchDueTriggersStep();


If dispatchDueTriggersStep throws (e.g. transient DB error), the workflow step will fail and the framework's retry mechanism kicks in. Depending on backoff behavior, this could repeatedly spam the DB. Consider wrapping the dispatch in a try-catch to log and continue to the next tick for transient errors.

pullfrog · 2026-03-13T15:39:49Z

.github/workflows/vercel-production.yml

+            "${{ needs.deploy-agents-api.outputs.url }}/api/deploy/restart-scheduler" \
+            -H "Authorization: Bearer ${{ secrets.INKEEP_AGENTS_RUN_API_BYPASS_SECRET }}" \
+            --retry 3 \
+            --retry-delay 5


Consider adding --retry-all-errors and --max-time 30. --retry 3 only retries on HTTP 5xx. If the new deployment hasn't finished booting, the connection may be refused outright, and curl won't retry without --retry-all-errors.

pullfrog · 2026-03-13T15:39:49Z

agents-api/src/domains/run/services/SchedulerService.ts

+  const run = await start(schedulerWorkflow, []);
+
+  await upsertSchedulerState(runDbClient)({
+    currentRunId: run.runId,
+    deploymentId: getDeploymentId(),
+  });


Race: supersession window allows dual dispatch. start(schedulerWorkflow) on line 16 launches the new scheduler, which calls registerSchedulerStep (writing runId to scheduler_state). But the old scheduler may have already woken and passed checkSchedulerCurrentStep before the upsert lands. Consider calling upsertSchedulerState before start(schedulerWorkflow) with a sentinel value, so the old scheduler's next check fails immediately.

claude

PR Review Summary

(9) Total Issues | Risk: High

🔴❗ Critical (2) ❗🔴

🔴 1) trigger_schedules Missing data migration for existing scheduled triggers

Issue: The new trigger_schedules runtime table will be empty after deployment. Existing enabled triggers in the manage DB (scheduled_triggers table) will not be backfilled. The scheduler workflow reads from trigger_schedules to dispatch triggers, but only newly created/updated triggers will be synced via onTriggerCreated/onTriggerUpdated.

Why: After deploying this migration, all existing enabled scheduled triggers will stop executing. The old per-trigger daisy-chaining workflows have been removed, and the new centralized scheduler will find zero rows in trigger_schedules. Production cron jobs and one-time triggers will silently fail to run until manually touched via the API or UI. This is a data consistency gap that will cause production outages.

Fix: Add a one-time backfill step. Options:

Add a migration script (similar to scripts/sync-spicedb.sh) that queries all enabled triggers from manage DB and inserts corresponding rows into trigger_schedules
Add backfill logic to startSchedulerWorkflow() in agents-api/src/index.ts that runs on server startup before the scheduler starts
Use the data reconciliation framework to sync existing triggers on first scheduler tick

Example:

// In SchedulerService.ts or index.ts startup
async function backfillTriggerSchedules() {
  const enabledTriggers = await listEnabledScheduledTriggers(manageDb)({ scopes: { /* all tenants */ } });
  for (const trigger of enabledTriggers) {
    await syncTriggerToScheduleTable(trigger);
  }
}

Refs:

Inline Comments:

🔴 Critical: restartScheduler.ts:32 Timing-attack vulnerable secret comparison + bypass when secret unset

🟠⚠️ Major (4) 🟠⚠️

🟠 1) trigger_schedules No claim timeout mechanism - stuck triggers unrecoverable

Issue: The claimedAt field has no expiration mechanism. If a dispatcher crashes after claimTriggerSchedule but before releaseTriggerScheduleClaim, the trigger remains claimed indefinitely. The partial index excludes claimed rows from findDueTriggerSchedules, so the trigger will never fire again.

Why: This creates a permanent silent failure mode. The only recovery would be manual database intervention to clear claimedAt. There's no alerting, no self-healing, and no visibility into stuck claims.

Fix: Modify the claim condition to treat stale claims as reclaimable:

// In claimTriggerSchedule - also claim if claimedAt is older than threshold
const claimCondition = or(
  isNull(triggerSchedules.claimedAt),
  lt(triggerSchedules.claimedAt, sql`now() - interval '5 minutes'`)
);

Or add a periodic cleanup job that releases claims older than a threshold.

Refs:

triggerSchedules.ts:103-128 — claim logic

🟠 2) scheduled-triggers.ts Data reconciliation check gutted - no observability into scheduler health

Issue: The check() function now returns empty arrays for all audit categories. This removes the ability to detect orphaned, missing, or stuck triggers through the data reconciliation system.

Why: Operations teams lose visibility into scheduled trigger health. If triggers fail to sync to trigger_schedules or the scheduler workflow dies, there is no automated detection. This contradicts the existing data reconciliation pattern used throughout the codebase.

Fix: Implement a new check function that validates the new architecture:

check: async (ctx): Promise<ScheduledTriggerAuditResult> => {
  const [enabledTriggers, schedules] = await Promise.all([
    listEnabledScheduledTriggers(ctx.manageDb)({ scopes: ctx.scopes }),
    listTriggerSchedulesByProject(ctx.runDb)({ scopes: ctx.scopes }),
  ]);
  
  const scheduleMap = new Map(schedules.map(s => [s.scheduledTriggerId, s]));
  const missingWorkflows = enabledTriggers
    .filter(t => !scheduleMap.has(t.id))
    .map(t => ({ triggerId: t.id, triggerName: t.name }));
  
  return { missingWorkflows, orphanedWorkflows: [], staleWorkflows: [], deadWorkflows: [], verificationFailures: [] };
}

Refs:

scheduled-triggers.ts:26-34 — empty check function

🟠 3) system Missing changeset for schema/behavior changes

Issue: This PR adds a database migration, new data-access exports, and a new API endpoint. Per AGENTS.md changeset guidance, schema changes requiring migration warrant a minor bump.

Fix: Create a changeset:

pnpm bump minor --pkg agents-core --pkg agents-api "Add scheduler workflow with centralized trigger dispatch and deploy restart endpoint"

🟠 4) system Critical paths have no test coverage

Issue: The following new code has no test coverage:

dispatchSingleTrigger — claim/advance/rollback concurrency control
computeNextRunAt — cron parsing and timezone handling
claimTriggerSchedule — optimistic locking primitive
checkSchedulerCurrentStep — scheduler supersession logic
/api/deploy/restart-scheduler — auth validation

Why: The dispatcher's claim/advance/rollback sequence is the core scheduling mechanism. A bug here could cause duplicate dispatches, permanently stuck triggers, or silent missed executions. These are the highest-risk code paths with zero coverage.

Refs:

triggerDispatcher.ts:54-116 — untested dispatch logic

Inline Comments:

🟠 Major: triggerDispatcher.ts:108 No error handling for claim release
🟠 Major: restartScheduler.ts:36 No error handling for scheduler restart
🟠 Major: computeNextRunAt.ts:21 Unhandled cron parsing exception

🟡 Minor (3) 🟡

🟡 1) SchedulerService.ts:16 Race condition between workflow start and state registration

Issue: The scheduler state is updated (line 18-21) after start() succeeds (line 16). If the process crashes between these operations, the new workflow runs without being registered in scheduler_state. However, registerSchedulerStep inside the workflow also calls upsertSchedulerState, providing a fallback.

Why: The double-registration is fine but creates potential inconsistency during the race window. Low severity because the workflow self-registers.

Refs:

SchedulerService.ts:15-21

🟡 2) index.ts:142 Scheduler startup failure doesn't prevent server from serving traffic

Issue: If startSchedulerWorkflow() fails during startup, the error is caught and logged but the server continues running without a scheduler. This is a silent failure state.

Fix: Consider exposing scheduler status via a health endpoint, or emit metrics/alerts when the scheduler fails to start.

🟡 3) triggerSchedules.ts:87-101 No limit on findDueTriggerSchedules query

Issue: The query returns all due triggers without a LIMIT clause. If many triggers become due simultaneously (e.g., after an outage), this could cause memory pressure.

Fix: Add a configurable LIMIT and process in batches across ticks.

Inline Comments:

🟡 Minor: vercel-production.yml:170 Missing timeout-minutes
🟡 Minor: vercel-production.yml:178 curl --retry won't retry HTTP 5xx

🚫 REQUEST CHANGES

Summary: This PR introduces a well-designed centralized scheduler architecture, but has a critical data migration gap that will break all existing scheduled triggers on deployment. The trigger_schedules table will be empty, and existing triggers won't fire until manually touched. Additionally, there are several error handling gaps that could leave triggers permanently stuck, and the data reconciliation check has been gutted without replacement. Please address the migration backfill and error handling issues before merging.

Discarded (8)

Location	Issue	Reason Discarded
`schedulerSteps.ts:8`	Duplicate scheduler state registration	Intentional fallback - workflow self-registers as recovery mechanism
`triggerDispatcher.ts:89`	Non-null assertion on nextRunAt	Query guarantees non-null via WHERE clause; fragile but correct
`triggerDispatcher.ts:45-49`	Error logging without trigger context	Minor logging improvement, not blocking
`schedulerWorkflow.ts:30`	Missing error handling in infinite loop	Workflow framework handles step retries
`vercel-production.yml:9`	Missing secret documentation	Documentation-only, not blocking
`runtime-schema.ts:960`	PK doesn't include all scope columns	Intentional simplification, scheduledTriggerId unique per tenant
`restartScheduler.ts:20`	Handler-level auth vs middleware	Intentional pattern for deploy hooks
`ScheduledTriggerService.ts:45-52`	No error handling in onTriggerCreated	Error propagates to caller appropriately

Reviewers (8)

Reviewer	Returned	Main Findings	Inline Comments	Discarded
`pr-review-architecture`	5	2	0	3
`pr-review-errors`	8	1	3	4
`pr-review-sre`	8	2	1	5
`pr-review-tests`	6	1	0	5
`pr-review-breaking-changes`	4	1	0	3
`pr-review-security-iam`	3	0	1	2
`pr-review-devops`	6	1	2	3
`pr-review-standards`	3	1	0	2
Total	43	9	7	27

claude · 2026-03-13T15:41:31Z

agents-api/src/routes/restartScheduler.ts

+
+    if (token !== env.INKEEP_AGENTS_RUN_API_BYPASS_SECRET) {
+      return c.json({ error: 'Unauthorized' }, 401);
+    }


🔴 CRITICAL: Timing-attack vulnerable secret comparison + bypass when secret unset

Issue: This comparison is not constant-time and allows auth bypass when INKEEP_AGENTS_RUN_API_BYPASS_SECRET is not configured. When the env var is unset (undefined), and no Authorization header is sent (token is also undefined), then undefined !== undefined is false, so auth passes.

Why: An attacker could probe the endpoint to infer the secret via timing differences. More critically, if deployed without the secret configured, the endpoint becomes fully unauthenticated, allowing anyone to restart the scheduler.

Fix:

Suggested change

}

if (!env.INKEEP_AGENTS_RUN_API_BYPASS_SECRET) {

return c.json({ error: 'Endpoint not available' }, 503);

}

if (!token || !constantTimeEqual(token, env.INKEEP_AGENTS_RUN_API_BYPASS_SECRET)) {

return c.json({ error: 'Unauthorized' }, 401);

}

You'll need to add a constantTimeEqual helper using crypto.timingSafeEqual() - see existing patterns in packages/agents-core/src/utils/apiKeys.ts:97.

Refs:

apiKeys.ts:97 — existing timing-safe comparison

claude · 2026-03-13T15:41:41Z

agents-api/src/domains/run/services/triggerDispatcher.ts

+  await releaseTriggerScheduleClaim(runDbClient)({
+    tenantId,
+    scheduledTriggerId,
+  });


🟠 MAJOR: No error handling for claim release - triggers can get stuck permanently

Issue: If releaseTriggerScheduleClaim fails after the workflow has been successfully started (line 93), the trigger remains claimed indefinitely. There's no claim timeout mechanism in the schema, so this trigger would never fire again.

Why: A transient database error at this point would leave claimedAt set with no recovery path. The only symptom would be silently missing scheduled executions, which are very hard to debug.

Fix:

Suggested change

});

try {

await releaseTriggerScheduleClaim(runDbClient)({

tenantId,

scheduledTriggerId,

});

} catch (err) {

logger.error(

{ scheduledTriggerId, err },

'Failed to release trigger claim after successful dispatch - trigger may remain claimed'

);

// Don't throw - workflow is already running

}

Also consider adding a claim timeout mechanism in findDueTriggerSchedules to reclaim triggers where claimed_at is older than a threshold (e.g., 5 minutes).

Refs:

triggerSchedules.ts:87-101 — findDueTriggerSchedules query

claude · 2026-03-13T15:41:46Z

agents-api/src/routes/restartScheduler.ts

+
+    const result = await startSchedulerWorkflow();
+
+    logger.info(result, 'Scheduler workflow restarted via deploy hook');


🟠 MAJOR: No error handling - scheduler restart failures are silent

Issue: If startSchedulerWorkflow() throws, the error propagates as an unhandled exception. The CI deploy step will fail, but the error details won't be surfaced in the response body.

Why: A failed scheduler restart during deployment leaves the system without an active scheduler workflow. Scheduled triggers would silently stop firing with no clear indication of why.

Fix:

Suggested change

logger.info(result, 'Scheduler workflow restarted via deploy hook');

try {

const result = await startSchedulerWorkflow();

logger.info(result, 'Scheduler workflow restarted via deploy hook');

return c.json(result);

} catch (err) {

logger.error({ error: err }, 'Failed to restart scheduler workflow via deploy hook');

return c.json(

{ error: 'Failed to start scheduler workflow', details: err instanceof Error ? err.message : String(err) },

500

);

}

claude · 2026-03-13T15:41:53Z

agents-api/src/domains/run/services/computeNextRunAt.ts

+      currentDate: baseDate,
+      tz: cronTimezone || 'UTC',
+    });
+    return interval.next().toISOString();


🟠 MAJOR: Unhandled cron parsing exception can leave triggers stuck

Issue: CronExpressionParser.parse() throws if the cron expression is invalid. This function is called from syncTriggerToScheduleTable and dispatchSingleTrigger. An invalid expression that bypasses validation would cause unhandled exceptions.

Why: In dispatchSingleTrigger, this throws after claiming the trigger but before advancing it, leaving the trigger stuck in claimed state.

Fix:

Suggested change

return interval.next().toISOString();

if (cronExpression) {

try {

const baseDate = lastScheduledFor ? new Date(lastScheduledFor) : new Date();

const interval = CronExpressionParser.parse(cronExpression, {

currentDate: baseDate,

tz: cronTimezone || 'UTC',

});

return interval.next().toISOString();

} catch (err) {

throw new Error(

`Invalid cron expression '${cronExpression}': ${err instanceof Error ? err.message : String(err)}`

);

}

}

claude · 2026-03-13T15:41:57Z

.github/workflows/vercel-production.yml

+  restart-scheduler:
+    name: Restart scheduler workflow
+    needs: [promote, deploy-agents-api]
+    runs-on: ubuntu-latest


🟡 Minor: Missing timeout-minutes on restart-scheduler job

Issue: This job has no timeout-minutes setting. If the endpoint hangs, the job could run until GitHub's 6-hour default.

Fix:

Suggested change

runs-on: ubuntu-latest

timeout-minutes: 5

steps:

Other jobs in this repo use timeout-minutes: 15-30. Since the curl has retries (max ~15s), 5 minutes provides ample margin.

claude · 2026-03-13T15:42:01Z

.github/workflows/vercel-production.yml

+            "${{ needs.deploy-agents-api.outputs.url }}/api/deploy/restart-scheduler" \
+            -H "Authorization: Bearer ${{ secrets.INKEEP_AGENTS_RUN_API_BYPASS_SECRET }}" \
+            --retry 3 \
+            --retry-delay 5


🟡 Minor: curl --retry won't retry HTTP 5xx errors by default

Issue: --retry 3 only retries on transient network errors (connection refused, timeout), not HTTP 5xx responses.

Fix: Add --retry-all-errors to retry on 5xx responses:

Suggested change

--retry-delay 5

--retry 3 \

--retry-delay 5 \

--retry-all-errors

This is curl 7.71+, available on ubuntu-latest runners.

itoqa · 2026-03-13T16:38:47Z

Ito Test Report ❌

19 test cases ran. 12 passed, 7 failed.

This run confirms multiple production-code defects across deploy-hook authentication and scheduled-trigger execution paths. ✅ Core CRUD, authorization boundaries, mobile usability, and injection protections behaved as expected in included passing cases, while several scheduling and rapid-action flows still show real correctness gaps under code-first review.

✅ Passed (12)

Test Case	Summary	Timestamp
ROUTE-4	Created a recurring scheduled trigger, edited it, and confirmed updated values persisted in the scheduled triggers list.	3:55
ROUTE-5	Deleting the scheduled trigger removed it from the list, and refreshing the stale invocations tab produced a safe 404 page.	8:07
ROUTE-6	Invocation list stayed coherent while switching status views after trigger execution, with no duplicated historical entries.	15:43
ROUTE-7	Workflow process endpoint returned 200 with processed=true and timestamp.	0:00
EDGE-3	After disabling the recurring trigger and observing/refreshing, no new scheduler dispatches were observed.	15:43
EDGE-5	Submit + immediate refresh kept a single created trigger row with no duplicates.	20:46
JOURNEY-1	Deep-link edit and invocations navigation remained coherent through back/forward transitions and hard refresh.	8:07
MOBILE-1	iPhone 13 viewport retained accessible trigger list actions, edit controls, and save flow.	8:08
ADV-1	Malformed Authorization variants consistently returned 401 Unauthorized.	0:00
ADV-2	Cross-project and cross-tenant tampering attempts returned 403 without foreign metadata disclosure.	26:57
ADV-3	Non-admin runAsUser patch attempt returned 403 and did not change trigger runAsUserId.	26:57
ADV-4	Script-like payload/template content did not execute in UI and surfaces remained stable.	21:19

❌ Failed (7)

Test Case	Summary	Timestamp
ROUTE-1	Both bearer-token restart attempts returned 401 Unauthorized instead of 200.	0:00
ROUTE-2	Request without Authorization header returned 200 with runId payload instead of 401.	0:00
EDGE-1	Invalid cron value was accepted and rendered as 'Hourly at :61', showing incomplete validation.	18:15
EDGE-2	One-time trigger created with past runAt appeared as a new pending invocation.	15:43
EDGE-4	Equivalent 1:00 AM schedules produced inconsistent nextRunAt behavior across timezone paths.	20:18
RAPID-1	Rapid rerun/cancel interactions produced multiple invocation rows for one trigger.	15:43
ADV-5	Burst traffic still returned 401 for bearer-token requests, with no successful authorized restart observed.	0:00

Restart scheduler endpoint accepts valid bearer token – Failed

Where: Deploy hook API endpoint /api/deploy/restart-scheduler in agents-api.
Steps to reproduce: Send POST /api/deploy/restart-scheduler with Authorization: Bearer <valid-token> while bypass secret is unset in runtime env.
What failed: Valid bearer-token calls are rejected with 401 instead of restarting the scheduler workflow.
Code analysis: The handler strips the bearer prefix and compares token equality directly against an optional env var; when the secret is missing, any provided token fails the equality check.

Relevant code:

agents-api/src/routes/restartScheduler.ts (lines 27-31)

const authHeader = c.req.header('Authorization');
const token = authHeader?.replace('Bearer ', '');

if (token !== env.INKEEP_AGENTS_RUN_API_BYPASS_SECRET) {
  return c.json({ error: 'Unauthorized' }, 401);
}

agents-api/src/env.ts (lines 79-82)

INKEEP_AGENTS_RUN_API_BYPASS_SECRET: z
  .string()
  .optional()
  .describe('Run API bypass secret for local development and testing (skips auth)'),

Why this is likely a bug: Optional secret configuration combined with direct equality creates a broken auth path where valid bearer tokens fail when the secret is unset.
Introduced by this PR: Yes – this PR modified the relevant code.

Restart scheduler endpoint rejects missing token – Failed

Where: Deploy hook API endpoint /api/deploy/restart-scheduler in agents-api.
Steps to reproduce: Send POST /api/deploy/restart-scheduler without an Authorization header when bypass secret is unset.
What failed: Missing-token request succeeds with 200 and returns scheduler run IDs instead of 401.
Code analysis: Both token and env.INKEEP_AGENTS_RUN_API_BYPASS_SECRET can be undefined; the equality check then passes and incorrectly authorizes an unauthenticated restart.

Relevant code:

agents-api/src/routes/restartScheduler.ts (lines 27-34)

const authHeader = c.req.header('Authorization');
const token = authHeader?.replace('Bearer ', '');

if (token !== env.INKEEP_AGENTS_RUN_API_BYPASS_SECRET) {
  return c.json({ error: 'Unauthorized' }, 401);
}

const result = await startSchedulerWorkflow();

Why this is likely a bug: An unauthenticated request should never be authorized; current logic allows auth bypass when the secret is undefined.
Introduced by this PR: Yes – this PR modified the relevant code.

Invalid cron expression does not break list or edit surfaces – Failed

Where: Scheduled trigger creation/update schema and list-time next-run computation.
Steps to reproduce: Create or update a trigger with an out-of-range cron minute (for example 61 * * * *) and load scheduled trigger list.
What failed: Invalid cron is accepted at write time and later fails at parse time, producing inconsistent/blank next-run behavior.
Code analysis: Cron validation uses a regex that checks token structure but not numeric ranges; later parsing errors are only logged and swallowed.

Relevant code:

packages/agents-core/src/validation/schemas.ts (lines 949-954)

export const CronExpressionSchema = z
  .string()
  .regex(
    /^(\*(?:\/\d+)?|[\d,-]+(?:\/\d+)?)\s+(\*(?:\/\d+)?|[\d,-]+(?:\/\d+)?)\s+(\*(?:\/\d+)?|[\d,-]+(?:\/\d+)?)\s+(\*(?:\/\d+)?|[\d,-]+(?:\/\d+)?)\s+(\*(?:\/\d+)?|[\d,\-A-Za-z]+(?:\/\d+)?)$/,
    'Invalid cron expression. Expected 5 fields: minute hour day month weekday'
  )

agents-api/src/domains/manage/routes/scheduledTriggers.ts (lines 156-166)

const interval = CronExpressionParser.parse(trigger.cronExpression, {
  currentDate: baseDate,
  tz: trigger.cronTimezone || 'UTC',
});
const nextDate = interval.next();
runInfo.nextRunAt = nextDate.toISOString();
} catch (error) {
  logger.warn(
    { triggerId: trigger.id, cronExpression: trigger.cronExpression, error },
    'Failed to calculate nextRunAt from cron expression'
  );
}

Why this is likely a bug: Production input validation should reject invalid schedules up front instead of persisting values that break runtime scheduling logic.
Introduced by this PR: No – pre-existing bug (code not changed in this PR).
Timestamp: 18:15

One-time trigger with past runAt is handled safely – Failed

Where: Trigger schedule computation and dispatcher due-trigger logic.
Steps to reproduce: Create a one-time trigger with runAt already in the past and allow scheduler dispatch tick.
What failed: A past one-time schedule is treated as immediately due and creates a pending invocation instead of being rejected or ignored.
Code analysis: computeNextRunAt returns runAt as-is (even if past), and dispatcher queries all due schedules with asOf: now, so stale one-time schedules dispatch instantly.

Relevant code:

agents-api/src/domains/run/services/computeNextRunAt.ts (lines 11-13)

if (runAt && !cronExpression) {
  return runAt;
}

agents-api/src/domains/run/services/triggerDispatcher.ts (lines 27-29)

const dueTriggers = await findDueTriggerSchedules(runDbClient)({
  asOf: now.toISOString(),
});

Why this is likely a bug: One-time schedules in the past should not generate new executions, but current production logic makes them dispatchable immediately.
Introduced by this PR: Yes – this PR modified the relevant code.
Timestamp: 15:43

DST/timezone boundary computes expected next run – Failed

Where: Scheduled trigger timezone input handling and next-run calculation path.
Steps to reproduce: Save cron triggers with different timezone values, including non-IANA values, then inspect list next-run data.
What failed: Timezone paths diverge to missing nextRunAt instead of deterministic scheduling behavior.
Code analysis: Timezone input is accepted as an arbitrary string (no IANA validation), and parse failures in list-time cron calculation are swallowed, leaving nextRunAt unset.

Relevant code:

packages/agents-core/src/validation/schemas.ts (lines 973-977)

z
  .string()
  .max(64)
  .default('UTC')
  .describe('IANA timezone for cron expression (e.g., America/New_York, Europe/London)'),

agents-api/src/domains/manage/routes/scheduledTriggers.ts (lines 156-166)

const interval = CronExpressionParser.parse(trigger.cronExpression, {
  currentDate: baseDate,
  tz: trigger.cronTimezone || 'UTC',
});
const nextDate = interval.next();
runInfo.nextRunAt = nextDate.toISOString();
} catch (error) {
  logger.warn(
    { triggerId: trigger.id, cronExpression: trigger.cronExpression, error },
    'Failed to calculate nextRunAt from cron expression'
  );
}

Why this is likely a bug: Allowing invalid timezone values to persist causes production scheduling state to degrade into missing next-run values instead of controlled validation errors.
Introduced by this PR: No – pre-existing bug (code not changed in this PR).
Timestamp: 20:18

Rapid clicks on run-now/cancel do not produce duplicate transitions – Failed

Where: Manual run endpoint for scheduled triggers (POST /{id}/run).
Steps to reproduce: Trigger multiple rapid run-now actions for the same scheduled trigger.
What failed: Multiple invocation rows are created in close succession instead of collapsing to a single transition.
Code analysis: Each request always generates a fresh invocation ID and a time-based idempotency key, so rapid repeated requests are treated as distinct runs.

Relevant code:

agents-api/src/domains/manage/routes/scheduledTriggers.ts (lines 1352-1364)

const invocationId = generateId();

await createScheduledTriggerInvocation(runDbClient)({
  id: invocationId,
  tenantId,
  projectId,
  agentId,
  scheduledTriggerId,
  status: 'pending',
  scheduledFor: new Date().toISOString(),
  idempotencyKey: `manual-run-${scheduledTriggerId}-${Date.now()}`,
  attemptNumber: 1,
});

Why this is likely a bug: The endpoint lacks deduplication/locking semantics for burst clicks, so production behavior violates expected idempotent transition handling.
Introduced by this PR: No – pre-existing bug (code not changed in this PR).
Timestamp: 15:43

Restart endpoint withstands burst requests without degraded auth behavior – Failed

Where: Deploy hook API endpoint /api/deploy/restart-scheduler under burst request load.
Steps to reproduce: Send concurrent restart requests with bearer tokens while bypass secret is unset.
What failed: All bearer requests return 401 and no authorized restart succeeds under burst traffic.
Code analysis: Auth logic is deterministic equality against an optional env var; with missing secret, bearer token requests all fail, so burst mode cannot produce valid authorized outcomes.

Relevant code:

agents-api/src/routes/restartScheduler.ts (lines 27-31)

const authHeader = c.req.header('Authorization');
const token = authHeader?.replace('Bearer ', '');

if (token !== env.INKEEP_AGENTS_RUN_API_BYPASS_SECRET) {
  return c.json({ error: 'Unauthorized' }, 401);
}

Why this is likely a bug: The same auth defect that blocks valid token requests also causes full failure under burst traffic, so authorized behavior degrades entirely.
Introduced by this PR: Yes – this PR modified the relevant code.

📋 View Recording

Screen Recording

claude

PR Review Summary

(2) Total Issues | Risk: High

This is a delta review scoped to the 2 commits (11 files) since the last automated review. The delta addresses several prior issues but leaves the most critical one unaddressed.

🔴❗ Critical (1) ❗🔴

🔴 1) system Missing data migration for existing scheduled triggers — STILL UNADDRESSED

Issue: This was flagged in the prior review as Critical and remains unaddressed in this delta. The new trigger_schedules runtime table will be empty after deployment. Existing enabled triggers in the manage DB will not be backfilled.

Why: After deploying this migration, all existing enabled scheduled triggers will stop executing. The old per-trigger daisy-chaining workflows have been removed, and the new centralized scheduler will find zero rows in trigger_schedules. Production cron jobs and one-time triggers will silently fail to run until manually touched via the API or UI.

Fix: Add a one-time backfill step. Options:

Add a migration script that queries all enabled triggers from manage DB and inserts corresponding rows into trigger_schedules
Add backfill logic to startSchedulerWorkflow() in agents-api/src/index.ts that runs on server startup before the scheduler starts
Use the data reconciliation framework to sync existing triggers on first scheduler tick

Refs:

Inline Comments:

🟠 Major: scheduled-triggers.ts:18-20 scheduleChanged missing cronTimezone comparison

🟡 Minor (0) 🟡

No new minor issues in delta.

💭 Consider (1) 💭

💭 1) .github/workflows/vercel-production.yml:167 Consider continue-on-error for non-blocking scheduler restart

Issue: The restart-scheduler job will fail the entire workflow if the scheduler restart endpoint returns an error after all retries. The scheduler has self-recovery mechanisms (supersession checks, re-registration on restart).

Why: If the CI call fails, the scheduler will recover when the API boots or on the next cron tick. However, if scheduler restart is considered critical to deploy success, the current strict behavior is appropriate.

Fix: Add continue-on-error: true at job level if scheduler restart failure should not fail the deploy workflow.

✅ Delta Fixes Verified

The following issues from the prior review have been properly addressed in this delta:

Issue	Status
Security: Timing-attack vulnerable secret comparison	✅ Fixed — `constantTimeEqual()` using `crypto.timingSafeEqual()`
Security: Auth bypass when secret unset	✅ Fixed — Returns 503 when `INKEEP_AGENTS_RUN_API_BYPASS_SECRET` is unset
Security: No error handling in restart handler	✅ Fixed — Try-catch with structured 500 response
CI: Missing timeout-minutes	✅ Fixed — Added `timeout-minutes: 2`
CI: curl --retry won't retry HTTP 5xx	✅ Fixed — Added `--retry-all-errors`
Correctness: Data reconciliation check gutted	✅ Fixed — Restored with cross-table comparison + tests
Correctness: No claim timeout mechanism	✅ Fixed — Added 5-minute stale claim recovery
Correctness: lte vs eq in claim safety	✅ Fixed — Simplified to atomic stale-claim-aware WHERE clause

🕐 Pending Recommendations (1)

🔴 Missing data migration for existing scheduled triggers — Critical issue from prior review, still unaddressed

🚫 REQUEST CHANGES

Summary: The delta addresses most feedback from the prior review — security fixes are complete, CI improvements are in place, and the data reconciliation check has been properly restored with tests. However, the critical data migration gap (existing triggers won't be backfilled to trigger_schedules) remains unaddressed. This will cause all existing scheduled triggers to stop firing after deployment. Please add a backfill mechanism before merging.

Discarded (3)

Location	Issue	Reason Discarded
`registry.test.ts:313`	Missing test for disabled orphaned schedules	Valid but low priority — edge case that affects reconciliation accuracy but not runtime execution
`registry.test.ts:293`	Missing test edge case documentation	Valid but INFO level — documents intended behavior but doesn't catch bugs
`vercel-production.yml`	Partial index won't cover stale claims query	Acceptable tradeoff — stale claims are rare, unclaimed triggers (common case) use index

Reviewers (5)

Reviewer	Returned	Consider	Inline Comments	Discarded
`pr-review-standards`	0	0	0	0
`pr-review-security-iam`	0	0	0	0
`pr-review-sre`	1	0	1	0
`pr-review-tests`	3	0	0	2
`pr-review-devops`	4	1	0	1
Total	8	1	1	3

Note: Security and standards reviewers returned 0 issues because all prior security issues were verified as fixed.

itoqa · 2026-03-13T18:34:40Z

Ito Test Report ❌

13 test cases ran. 11 passed, 2 failed.

✅ The scheduled-trigger regression pass found stable behavior for deep-link loading, validation, mobile usability, invocation creation, and auth hardening checks. 🔍 Code-first verification of the two included failures indicates likely product defects in next-run handling for past one-time schedules and in rapid toggle state consistency under concurrent updates.

✅ Passed (11)

Test Case	Summary	Timestamp
ROUTE-4	Endpoint returned 503 with error Endpoint not available as expected for unavailable run bypass secret profile.	0:00
ROUTE-5	GET /api/workflow/process returned 200 with valid JSON {processed:true} and completed in 50 seconds, within expected long-running window.	1:30
LOGIC-4	Deleted trigger row was removed from Scheduled tab and invocations view showed no invocation entries.	9:58
LOGIC-5	Run Now created an invocation record and it progressed to terminal Failed state with trace links.	9:58
LOGIC-6	Deep-link edit route loaded correctly and persisted populated form data across refresh without unauthorized or missing-data errors.	12:33
EDGE-3	Submitting invalid custom cron produced validation error and prevented update persistence.	9:58
EDGE-4	At 390x844 viewport, trigger controls remained accessible and an edit/update roundtrip completed successfully.	15:26
EDGE-5	Three back/forward cycles preserved route context and page state without stale data or overlay corruption.	15:48
ADV-3	Direct unauthenticated manage endpoint request returned 401 Unauthorized and did not expose trigger data.	19:07
ADV-4	Authenticated tampered-scope request returned 403 and UI tampering rendered 404 project-not-found state with no foreign data exposure.	19:04
ADV-5	Script-tag and onerror payload strings were persisted as plain text in list/edit views, and no script execution dialog was triggered.	19:00

❌ Failed (2)

Test Case	Summary	Timestamp	Screenshot
EDGE-1	Created a one-time trigger in the past, but the list still displayed a concrete Next Run timestamp instead of an em dash.	14:11
EDGE-2	After rapid toggling, the intended final disabled state did not persist after refresh and reverted to enabled.	14:43

One-time trigger scheduled in the past shows no upcoming next run – Failed

Where: Scheduled triggers list Next Run column for one-time trigger rows.
Steps to reproduce: Create a one-time trigger with runAt in the past, open/refresh the scheduled triggers list, and inspect Next Run.
What failed: UI showed a concrete upcoming value instead of no upcoming run (—) for a past one-time schedule.
Code analysis: I traced next-run derivation through runtime run-info aggregation and schedule computation. The run-info batch function promotes any pending invocation to nextRunAt without checking whether it is already in the past, and one-time schedule computation returns runAt directly with no past-time guard.

Relevant code: Cite 1–3 key files with line numbers and include short code snippets (fenced blocks with language).

packages/agents-core/src/data-access/runtime/scheduledTriggerInvocations.ts (lines 479–486)

for (const inv of allInvocations) {
  const triggerInfo = result.get(inv.scheduledTriggerId);
  if (!triggerInfo) continue;
  if (inv.status === 'pending' && !triggerInfo.nextRunAt) {
    triggerInfo.nextRunAt = inv.scheduledFor;
  }
  if ((inv.status === 'completed' || inv.status === 'failed') && !triggerInfo.lastRunAt) {

agents-api/src/domains/run/services/computeNextRunAt.ts (lines 11–21)

if (runAt && !cronExpression) {
  return runAt;
}

if (cronExpression) {
  const baseDate = lastScheduledFor ? new Date(lastScheduledFor) : new Date();
  const interval = CronExpressionParser.parse(cronExpression, {
    currentDate: baseDate,
    tz: cronTimezone || 'UTC',
  });

Why this is likely a bug: The code path can surface stale/past schedule timestamps as nextRunAt, which conflicts with expected one-time past-trigger semantics (no upcoming run).
Introduced by this PR: No – pre-existing bug (code not changed in this PR).
Timestamp: 14:11

Rapid enable/disable interaction keeps final trigger state consistent – Failed

Where: Scheduled trigger enable switch on the triggers table.
Steps to reproduce: Rapidly click the enabled switch multiple times, stop on disabled as intended, refresh, then verify persisted status.
What failed: Final persisted state can differ from the user’s final intent after rapid interactions.
Code analysis: The UI issues independent toggle requests without serializing/canceling in-flight updates, while the API applies each PATCH as a blind write with no version/ordering guard. This allows out-of-order request completion to overwrite the intended final value.

Relevant code: Cite 1–3 key files with line numbers and include short code snippets (fenced blocks with language).

agents-manage-ui/src/components/project-triggers/project-scheduled-triggers-table.tsx (lines 95–106)

const toggleEnabled = async (triggerId: string, agentId: string, currentEnabled: boolean) => {
  const newEnabled = !currentEnabled;
  setLoadingTriggers((prev) => new Set(prev).add(triggerId));

  try {
    const result = await updateScheduledTriggerEnabledAction(
      tenantId,
      projectId,
      agentId,

agents-manage-ui/src/components/project-triggers/project-scheduled-triggers-table.tsx (lines 317–322)

<Switch
  checked={trigger.enabled}
  onCheckedChange={() =>
    toggleEnabled(trigger.id, trigger.agentId, trigger.enabled)
  }
  disabled={isLoading || !canManage}
/>

agents-api/src/domains/manage/routes/scheduledTriggers.ts (lines 568–582)

const updatedTrigger = await updateScheduledTrigger(db)({
  scopes: { tenantId, projectId, agentId },
  scheduledTriggerId: id,
  data: {
    ...body,
    maxRetries: resolveRetryValue(body.maxRetries, existing.maxRetries, 3),
    retryDelaySeconds: resolveRetryValue(
      body.retryDelaySeconds,
      existing.retryDelaySeconds,
      60
    ),

Why this is likely a bug: Without request-order protection, rapid toggles can commit stale writes and violate final-state consistency expected by users.
Introduced by this PR: No – pre-existing bug (code not changed in this PR).
Timestamp: 14:43

📋 View Recording

Screen Recording

claude

PR Review Summary

(0) Total Issues | Risk: Low (delta scope)

This is a delta review scoped to changes since the last automated review (7af85b5d481f).

Delta Changes

The only change in this delta is the addition of a new specification document:

specs/2026-03-13-manage-table-cron-dispatcher/SPEC.md (+256 lines)

This document evaluates an alternative architectural approach (using the manage DoltgreSQL table vs runtime Postgres table for scheduling) and recommends against it. The analysis is thorough and well-reasoned, covering:

Branch iteration cost scaling (O(n) vs O(1))
DoltgreSQL versioning overhead for high-frequency transactional writes
Connection pool pressure
Broken claim/release locking semantics in Dolt's commit model

The spec correctly concludes that the runtime-table approach (already implemented in this PR) is the better design choice.

✅ Delta Review: Clean

No new issues in the delta. The spec document is well-written architectural documentation that supports the implementation decisions already made.

🕐 Pending Recommendations (1)

🔴 Missing data migration for existing scheduled triggers — Critical issue from prior review, still unaddressed. The new trigger_schedules table will be empty after deployment; existing enabled triggers won't be backfilled.

💡 APPROVE WITH SUGGESTIONS

Summary: The delta (spec document) is clean and adds valuable architectural context. However, the critical data migration gap from prior reviews remains unaddressed — existing scheduled triggers will not be synced to the new trigger_schedules table on deployment, causing all existing cron/one-time triggers to stop firing. Please add a backfill mechanism before merging.

Reviewers (0)

No code reviewers dispatched — delta contains only documentation changes.

shagun-singh-inkeep added 2 commits March 13, 2026 10:24

data model + sync mechanism

c45dc4c

implementation

8a88bd2

vercel bot had a problem deploying to Preview – agents-docs March 13, 2026 15:35 Failure

style: auto-format with biome

f549231

vercel bot had a problem deploying to Preview – agents-docs March 13, 2026 15:37 Failure

vercel bot deployed to Preview – agents-manage-ui March 13, 2026 15:38 View deployment

vercel bot deployed to Preview – agents-api March 13, 2026 15:38 View deployment

pullfrog bot reviewed Mar 13, 2026

View reviewed changes

claude bot reviewed Mar 13, 2026

View reviewed changes

github-actions bot deleted a comment from claude bot Mar 13, 2026

shagun-singh-inkeep added 2 commits March 13, 2026 13:40

claude comments and changeset

6e80c62

fix

7af85b5

vercel bot had a problem deploying to Preview – agents-docs March 13, 2026 17:41 Failure

vercel bot deployed to Preview – agents-manage-ui March 13, 2026 17:43 View deployment

vercel bot deployed to Preview – agents-api March 13, 2026 17:43 View deployment

claude bot reviewed Mar 13, 2026

View reviewed changes

github-actions bot deleted a comment from claude bot Mar 13, 2026

spec

e1d47f2

vercel bot had a problem deploying to Preview – agents-docs March 13, 2026 21:37 Failure

vercel bot deployed to Preview – agents-manage-ui March 13, 2026 21:38 View deployment

claude bot reviewed Mar 13, 2026

View reviewed changes

vercel bot deployed to Preview – agents-api March 13, 2026 21:39 View deployment

github-actions bot deleted a comment from claude bot Mar 13, 2026

-    }
+    if (!env.INKEEP_AGENTS_RUN_API_BYPASS_SECRET) {
+      return c.json({ error: 'Endpoint not available' }, 503);
+    }
+    if (!token || !constantTimeEqual(token, env.INKEEP_AGENTS_RUN_API_BYPASS_SECRET)) {
+      return c.json({ error: 'Unauthorized' }, 401);
+    }

-  });
+  try {
+    await releaseTriggerScheduleClaim(runDbClient)({
+      tenantId,
+      scheduledTriggerId,
+    });
+  } catch (err) {
+    logger.error(
+      { scheduledTriggerId, err },
+      'Failed to release trigger claim after successful dispatch - trigger may remain claimed'
+    );
+    // Don't throw - workflow is already running
+  }


		const result = await startSchedulerWorkflow();

		logger.info(result, 'Scheduler workflow restarted via deploy hook');

Conversation

shagun-singh-inkeep commented Mar 13, 2026

Uh oh!

vercel bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

changeset-bot bot commented Mar 13, 2026

⚠️ No Changeset found

Uh oh!

pullfrog bot commented Mar 13, 2026

Key changes

Centralized schedulerWorkflow replaces per-trigger daisy-chaining

triggerDispatcher — claim-based dispatch with rollback

trigger_schedules + scheduler_state runtime tables

ScheduledTriggerService rewrite — sync to runtime table

scheduledTriggerRunnerWorkflow — one-shot execution

Deploy hook: /api/deploy/restart-scheduler

Uh oh!

pullfrog bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

PR Review Summary

🔴❗ Critical (2) ❗🔴

🟠⚠️ Major (4) 🟠⚠️

🟡 Minor (3) 🟡

🚫 REQUEST CHANGES

Uh oh!

claude bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 13, 2026

vercel bot commented Mar 13, 2026 •

edited

Loading

changeset-bot bot commented Mar 13, 2026 •

edited

Loading

Centralized `schedulerWorkflow` replaces per-trigger daisy-chaining

`triggerDispatcher` — claim-based dispatch with rollback

`trigger_schedules` + `scheduler_state` runtime tables

`ScheduledTriggerService` rewrite — sync to runtime table

`scheduledTriggerRunnerWorkflow` — one-shot execution

Deploy hook: `/api/deploy/restart-scheduler`

pullfrog bot left a comment •

edited

Loading