feat(tools/modal): add Function + Volume capabilities#55
Closed
KillerQueen-Z wants to merge 12 commits into
Closed
feat(tools/modal): add Function + Volume capabilities#55KillerQueen-Z wants to merge 12 commits into
KillerQueen-Z wants to merge 12 commits into
Conversation
This is the first feature/vscode-extension* branch built on top of origin/main directly rather than a stack of cherry-picks. The previous branch had drifted ~500 commits behind main as upstream shipped: - v3.10.0 detached background tasks (Detach tool + franklin task CLI) - v3.9.0 Skills system (SKILL.md loader, registry, bundled grill) - v3.9.1 status bar shows chain + default spend cap raised to $2 - v3.9.2 Kimi K2.6 alignment - v3.9.3 /model picker trim 28 → 23 - v3.9.4 roleplayed JSON tool-calls + V4 Flash / Omni metadata - v3.9.5 Nemotron Omni prose stripping + gpt-image-2 size pin - v3.9.6 reasoning-model TTFB defaults + long-task guidance - v3.8.40 i2i timeout (#19) + configurable spend cap (#20) — already our PRs, now confirmed merged - v3.8.41 smart timeout recovery (#26) - v3.8.42 default spend cap $0.25 → $1.00 (#28) - v3.8.43 proxy: per-request timeout + payment-aware fallback (#31) - #34 SKILL.md skills loader - #35 first-class Wallet tool Cherry-picking each onto the old branch would have produced a wall of no-op-content / phantom conflicts (the cherry-picks didn't share commit hashes with main even though their content matched). Instead this branch starts from origin/main and re-applies only the bits that are genuinely extension-specific: - vscode-extension/ (entire directory — webview app, build, README, mascot images, VSIX assets) - src/api/vscode-session.ts (new file: extension-host session helper) - src/commands/config.ts (added default-image-model + default-video- model keys; exported saveConfig for the settings popover; kept main's $1 default comment + max-turn-spend-usd key) - src/agent/streaming-executor.ts (added ImageGen / VideoGen case to inputPreview so timeline shows model) - src/commands/doctor.ts (export runChecks so vscode-session can re-export it as runDoctorChecks) - package.json (./vscode-session export — alongside ./wallet, etc.) Bumps vscode-extension to 0.5.0 (was 0.4.5). Also adds vscode-extension/ *.vsix to .gitignore — packaged builds shouldn't be tracked. Old feature/vscode-extension preserved at backup/vscode-extension-pre-sync.
Mirror of upstream PR #36 (fix/savings-includes-media-cost). The "Saved vs Opus" panel hero would show negative dollar amounts as soon as a user spent meaningfully on ImageGen / VideoGen, e.g. $-8.79 You spent $20.4896 instead of $11.70 Root cause: getStatsSummary() compared an Opus-token baseline (chat only — image/video log inputTokens=0/outputTokens=0) against totalCostUsd (chat + media combined), so once media spend exceeded the chat-vs-Opus delta the difference flipped negative. Fix: split byModel into chatOnlyCost (rows with tokens) and mediaCost (rows without). opusCost on the display side now equals opusChatCost + mediaCost so "you spent X instead of Y" stays apples-to-apples; saved = max(0, opusChatCost - chatOnlyCost) is the chat-side delta only and is clamped non-negative. Bumps vscode-extension to 0.5.1; updates README changelog.
…ion-v0.5 # Conflicts: # src/panel/html.ts # src/stats/tracker.ts
…+ history rename/delete + Detach cwd fix + insights category breakdown + wallet QR + GPU sandbox panel + session import + rate-limit toast This branch preserves the in-progress parallel image/video generation feature (concurrent: 'batch' + askUser merge + walletReservation + runBatchPool) which the v0.5 extension branch will revert until it's been validated end-to-end. Companion features intentionally kept on v0.5: - Modal sandbox tools (use walletReservation but only from Modal, not media gen) - Detach cwd-resolution fix - History rename/delete - Wallet QR popover - Tasks + GPU Sandboxes overlay panels - Session import (Claude Code / Codex) - Rate-limit friendly toast - Insights By Category breakdown Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… feature/parallel-media-gen The cherry-pick brought in everything from the WIP branch including the in-progress parallel image/video pipeline (concurrent: 'batch', batch preflight, askUser mutex, walletReservation, batch-concurrency config, settings UI). That pipeline hasn't been validated end-to-end yet, so it's deferred to feature/parallel-media-gen until ready. What's KEPT on v0.5 (validated, ship-ready): - Modal sandbox tools (ModalCreate/Exec/Status/Terminate) + GPU sandbox panel - Detach cwd-resolution bug fix + 4-strategy fallback - History rename / delete with inline confirm UI - Wallet QR popover (chain-aware EIP-681 / Solana Pay) - Tasks overlay + badge - Session import (Claude Code / Codex) - Rate-limit friendly toast - Insights By Category breakdown (chat/media/sandbox) - Image gen: response_format strip for gpt-image-* family + verbose error diagnostics + async polling for slow models - Default-image-model / default-video-model config consultation - Defensive sanitizeOutgoingMessages in llm.ts - Modal tool exempt from 3-failure auto-disable in tool-guard - Settings popover refresh + obsolete max-turn-spend-usd auto-strip What's REMOVED (now only on feature/parallel-media-gen): - 'batch' concurrent mode in CapabilityHandler - BATCH_CONCURRENCY pool + runBatchPool in streaming-executor - preflightBatch + askUserChain mutex + batchPreApproved Set - skipAskUser in ExecutionScope - walletReservation usage from imagegen/videogen - batch-concurrency config key - 'Parallel image / video' setting input - Random suffix on default output paths WalletReservation infrastructure stays (Modal tools use it). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings in 120 commits since the last sync (1106ef5), including: Vision (this is the big-ticket fix): - PR #53 + sibling-sites patch: preserve image blocks in budgetToolResults / ageOldToolResults / dedup; client-side sharp resize on Read (1.9MB PNG -> 117KB). LLM / gateway: - Gemini Pro non-streaming requests - 429 Retry-After honoring - Stream char sanitizer (U+2502 / U+2500 -> ASCII) - Gateway error text doesn't kill session - Classifier separates payment_rejected from payment_required Stats / cost: - franklin stats reads cost_log.jsonl (SDK ledger) - recorded-vs-wallet gap detection - image/video/modal latency measured at 5 callsites - agent-loop measures real LLM latency (was hardcoded 0) Loop / agent: - same-tool warn-once + signature-based stuck detector - switch model when intent declared without tool_use - --resume preserves cost / token totals Wallet / Swap: - Base0xGaslessSwap (user pays no ETH for gas) - Base 0x V2 + Permit2 - Jupiter Ultra swap with on-chain referral fee Prediction Market: full rewrite of wallet-analysis triplet, smartMoney replacement, walletProfile addresses fix. Trading: TickerToId expansion (TON etc), dual-listing notice for tokenized equities, live-swap session cap. Modal: latency tracking, logger migration. ImageGen: HTTP 202 queue handling, latency, error surfacing. Conflict resolution: - tools/modal.ts, tools/index.ts, tasks/spawn.ts, session/storage.ts, agent/tool-guard.ts: take main. - session/storage.ts: ported v0.5's deleteSession + renameSession + SessionMeta.title back on top of main. - tools/imagegen.ts: hand-merged. Kept v0.5's broader async detection (handles HTTP 202 + status fields + non-JSON body fallback), kept main's bits-based error surfacing + exported pollImageJob singleton, deduped poll helpers. - tools/videogen.ts: re-added missing videoGenCapability singleton export so test fixtures keep importing it. Tests: 368/368 pass. Build clean.
Bumps the VS Code extension to 0.6.1 and rebundles out/extension.cjs on top of the v0.5↔main merge. The user-visible win is the vision fix: image paste / drop / file Read no longer over-charges ($0.50/call → bounded by client-side sharp resize) and no longer hallucinates descriptions (image blocks now survive the optimizer pipeline end to end). The bundled franklin core jumps from v3.10.x territory to v3.15.90, picking up the 120 commits enumerated in the merge message — the extension inherits all of them with no extension-side code change required (UI surfaces of the new prediction / Base / Modal / etc. tools land automatically through the agent's tool inventory).
…image guard Brings in d370a38 + 5003b67. The two patches do exactly what we were about to design ourselves (and did some research on, comparing opencode / Aider / Continue / OpenHands / Cline patterns): - New src/router/vision.ts: curated vision-model allowlist, basename-anchored image-path regex, family-aware sibling picker. - routeRequest / routeRequestAsync / resolveTierToModel take a new needsVision flag. Auto routing now walks the tier chain for the first vision-capable model when an image is in play; escalates to COMPLEX (Opus) if the whole tier is text-only. - Manual-mode guard in agent/loop.ts: detects image refs in user input on turn 1, swaps the user's text-only pick to the closest family vision sibling for ONE turn with a visible warning. Next turn's baseModel recovery restores the user's pick. - proxy/server.ts mirrors the same logic on the Anthropic proxy path (scans messages[] for image / image_url / input_image parts plus paths in text parts). - 5 new tests; 373/373 pass total. Better than the design we discussed: their swap-with-warning single-turn approach beats the silent-strip pattern that opencode / Continue / OpenHands all use, by avoiding the "user can't tell what model is running" failure mode of silent model substitution.
Pairs with BlockRunAI/blockrun rfc/modal-full-chain (gateway PR).
New capabilities:
- ModalDeployFunction: register a long-running Python function on Modal
(custom pip deps, GPU choice, up to 24h timeout). Charges max_timeout
× hourly rate upfront — same model as long-task sandbox.
- ModalRunFunction: trigger a deployed function. Returns run_id; poll
for result. Compute already paid at deploy.
- ModalGetFunctionStatus: poll a run for status/result/error.
- ModalCreateVolume: create persistent storage. \$0.20/GB-month, 1mo
prepaid. Up to 200GB per wallet.
- ModalListVolumes: list caller's volumes.
- ModalDeleteVolume: delete a volume (no refund).
These close the gap between the 24h-capped Sandbox path and the long-
running ML workflows agents need (fine-tuning, batch jobs, persistent
checkpoints). Smart-rebate / actual-usage settlement is Phase B
(documented separately in the gateway team's Notion checklist) — v1
charges upfront and does not refund early-finish.
Wire-level design: see RFC in BlockRunAI/blockrun (rfc/modal-full-chain).
Gateway must be deployed first; this client PR is no-op until then.
Franklin's main hadn't run CI since 2026-04-21; some package.json change landed without an accompanying lockfile bump, so 'npm ci' fails on: npm error Missing: utf-8-validate@5.0.10 from lock file Regenerated cleanly via 'rm -rf node_modules package-lock.json && npm install'. Lockfile is now in sync with current package.json. This commit is unrelated to the Modal capabilities being added in this PR — included solely to unblock CI on this branch (and incidentally on main too).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds 6 new Modal capabilities for long-running GPU workflows. Pairs with BlockRunAI/blockrun#16 — gateway must be merged + deployed first, this PR is no-op until then.
New capabilities
Use case
Closes the gap between the existing 24h-capped Sandbox path (`ModalCreate`) and real ML workflows: fine-tuning, batch jobs with checkpoints, multi-day data pipelines.
Pricing
v1 charges upfront at deploy time using the same hourly tiers as long-task sandbox ($0.10/h CPU → $8/h H100). NO REFUND on early termination — over-allocating `timeout` wastes USDC.
Smart-rebate / actual-usage settlement is Phase B, documented in the gateway team's Notion checklist.
Test plan
No breaking changes
All existing ModalCreate/Exec/Status/Terminate capabilities remain. New capabilities are additive in the `modalCapabilities` array.