Builder outcome improvements: continuity-of-self agent model, canon-first builders#95
Conversation
…tic, lean workflow-builder: - New leanness/outcome scanner owns the minimal-baseline tests (core test, defend-against-its-own-absence, outcome-vs-prescription) that nothing applied before - Report pipeline collapsed: scanners return lean JSON in-context, one report-author fills a stable self-contained HTML shell with a single JSON island (cannot render blank; multi-select + copy-to-paste-back-prompt baked in). Retired generate-html-report.py, extract-report-json.py, the rigid report-data schema, report-quality-scan-creator.md - memlog replaces .decision-log.md (scripts/memlog.py, typed append-only) - customize.toml is the sole customizability mechanism; build flow asks (default no, headless no unless requested); no installer questions, no module.yaml authoring - tiktoken token counts replace line counts as the length metric (scripts/count_tokens.py) - Build flow rewritten from the 5-phase lockstep into the migration-guide Process loop - Enhancement scanner flipped to add-or-subtract; standard-fields numbered-steps fix - Script-opportunity seeking kept and strengthened (native python) eval-runner: - Rebuilt lean: dropped Docker, PTY, keychain staging, dual isolation - Standalone and builder-invoked; everything runtime-specific behind one platform-adapter seam (invocation command, auth env-var, transcript schema; skill_dir/load_signal for triggers) - Four modes: baseline (skill vs bare model), variant (full vs stripped), quality, trigger - Case = input + rubric + optional state_prefix (turn-simulation); bounded self-improvement loop - No hardcoded model list anywhere
Rename the Conventions heading to Resolution rules (binding, covers non-path variables) and collapse the redundant bare-path/skill-root bullets into one with an explicit not-the-project-directory guard.
Replace the pre-rebuild quality and infrastructure layer while keeping the agent-specific core (stateless/memory/autonomous gradient, sanctum, First Breath, persona-as-deliverable, script discipline). - Fold eight quality-scan files into six base lenses plus a conditional sanctum-architecture lens; each loads one shared agent-quality-principles.md and returns findings JSON in-context. - Add the gospel prompt-quality canon (references/prompt-quality-canon.md, pulled in on demand via a CREED standing order) so evolving agents stay lean instead of carrying a stale standards copy. - Vendor memlog.py and count_tokens.py from the workflow-builder; add prepass.py to gate the sanctum lens; switch length rules to tiktoken token budgets, never line counts. - Replace the template+renderer report path with a stable self-contained HTML shell plus a single JSON island and a report-author. - Rename the Conventions heading to Resolution rules and compress the path-resolution bullets to match the workflow-builder.
… patterns Customization - Ship the builder's own customize.toml with org knobs and wire each one: build_standards (enforced build criteria + Analyze conformance), evals_required (eval ship-gate), skill_md_token_desired/budget (tiered SKILL.md length), plus the standard activation/persistent_facts/on_complete. - Replace the re-typed customize merge-rule prose in activation with the one-line resolve_customization.py call. Reference sweep (bloat + reader-relative tenet) - skill-quality-principles.md gains a reader-is-an-LLM framing (cut what the reader knows; keep the why a goal needs to generalize) and the leanness/ enhancement lenses now cut both ways. - Cut duplication: standard-fields.md defers customize/path/description to their owners (-40%); script-opportunities defers authoring to script-standards; report-author and scan-orchestration de-duplicated. - scan-architecture rewritten to cite rather than restate. Working state - New working-state-patterns.md frames the choice (memlog vs structured working artifact vs both vs neither); relocate the full memlog treatment out of skill-quality-principles.md (~900 tokens off the hot file). - New producing-workflow-patterns.md (persona, intent modes, degradation). Conventions - Resolution rules heading and compressed path block. - No-numbered-prefix demoted from hard rule to soft preference everywhere. - config.yaml reading no longer taught to net-new skills. Handoff - Finalize now proactively offers full validation and opens the HTML report. - Add a 'harden the idea before you build it' beat to resist quiz-and-go.
- Replace the re-typed customize merge-rule prose in activation with the one-line resolve_customization.py call. - Sync the gospel canon shipped copy with the new 'Who reads this' section (write for the LLM reader; cut what it knows, keep the why a goal needs to generalize). - Make the leanness lens cut both ways: flag self-describing/negative-space/ wrong-altitude noise and a goal stripped of the why it needs (persona carve-out preserved).
Add a 'Most fixes are truncation, not deletion' section to the prompt-quality canon: keep the nudge, drop the over-explanation the reader infers, and cap the kept why at one clause rather than a justification essay. Propagate to the binary 'cut it' language so it changes behavior: the workflow-builder Core Test and both leanness lenses now recommend shrinking to the nudge and reserve outright removal for lines the model already does entirely.
Add docs/explanation/outcome-driven-prompt-quality.md, the published source of the Outcome-Driven Prompt Quality canon, synced with the shipped copy: the core test, Who reads this, Most fixes are truncation, the two-version comparison, progressive disclosure, and the habit.
…port fixes The canon (prompt-quality-canon.md) is now the sole statement of the universal quality tests; everything else points at it instead of restating it. Deduplication and re-routing: - skill-quality-principles.md cut to pure BMad institutional knowledge (3,943 -> 2,355 tokens); the canon-restating sections are gone - Both scan-leanness lenses cut to canon + lane mechanics only - Agent script-opportunities-reference rewritten from the retired-script graveyard to the live lean shape (2,720 -> 1,039 tokens) - Lenses load their lane's spec directly: customization -> the toml guide, determinism -> script-opportunities; per-lens subagent context drops by roughly half - Both build-process files open with a standalone canon mandate and no longer re-teach the small-version recipe, the two-version comparison, or the customization ask their spec files own Every path now carries the bar: - Agent edit-guidance loads canon + principles (it loaded no standard at all); numbered heading prefixes removed - Agent standard-fields drops the madlib Overview templates and the third statement of the sanctum-vs-override rationale Report fix prompts anchor to the standards: - findings.json gains a standards block (canon / principles / script-standards absolute paths, filled by the orchestrator) - Both report shells prepend it to every copied fix prompt, so the session applying a fix holds the same bar that produced the findings - Verified end-to-end in Chromium: theme-copy and findings-copy both open with the preamble in both shells Also lands the report v2 pipeline (deterministic render_report.py with synthesis layer, placeholder islands, run-folder + headless contracts), the eval-runner adapter seam with env isolation and trigger detection tests, builder ideation beats (propose-what-the-idea-implies, show-the-draft), agent-builder customize.toml (fixes activation step 1), process-template.py leftover-marker gate, the embedded canon copies guarded by test_canon_sync.py, and removal of the three superseded prepass scripts.
… their own lint
Corpus reorg (workflow 18 -> 17 files, agent 26 -> 23):
- lens-contract.md (both builders): single statement of the lens return
schema, in-context rule, id numbering, and don't-pad rule; all 12 lens
files point at it instead of carrying their own drifting schema copies
- report-author.md merged into its only loader (scan-orchestration /
quality-analysis) in both builders
- complex-workflow-patterns shrunk to its unique routing mechanics
(750 -> 455); deleted workflow's orphaned template-substitution-rules
- Deleted the three agent samples that were 80-88% line-identical
stale copies of their assets/ templates (sample-first-breath,
sample-memory-guidance, sample-init-sanctum.py); build-process now
wires emission to the templates, which were previously unloaded
Path-convention defect fixed (33 scanner findings -> 0):
- Every shipped template, sample, and init script used cross-directory
./ paths that scan-path-standards flags high, so every emitted agent
was born failing its own lint gate; all normalized to bare skill-root
paths, including the strings both init scripts generate into sanctums
- Two bare-_bmad path defects fixed; teaching examples rephrased so
they no longer literal-match the scanner
Agent reference staleness:
- template-substitution-rules rewritten (1151 -> 831): legacy
{if-memory}/{if-headless} archaeology deleted, Path References now
states the bare-path convention, process-template.py invocation up
front
- "Phase 3/5" references removed from mission-writing and first-breath
guidance (build-process is one loop, not phases)
- Customization-by-archetype ownership settled: principles owns the
contract, standard-fields owns the schema; agent-type-guidance keeps
only its discovery-time naming rule
agentskills.io adoptions (workflow-builder):
- build-process: "Ground it in real expertise" beat (mine runbooks,
incident reports, review comments, VCS history, done-by-hand
transcripts; generalize beyond the worked instance) and trace-reading
diagnostics at the eval beat
- skill-quality-principles: gotchas-stay-in-SKILL.md carve exception,
plan-validate-execute pattern, defaults-not-menus writing rule
SKILL.md token tiers raised: desired 1200 -> 2000, hard budget
2000 -> 3000 (still under the spec's 5k; budget is a drift guardrail,
the canon's tests remain the leanness bar).
Byte-identical copy of the canonical workflow-builder script that nothing referenced; transcript token accounting is inlined in run_evals.py. Eval-runner tests still pass.
Replace the "every session is a rebirth" framing with a continuous-self model across the agent-builder templates, validators, sample agents, and docs. An agent is born once at First Breath and is continuous thereafter; the context reset is sleep, not death, and the sanctum is its real persistent memory reloaded on waking. Templates (what future agents inherit): - SKILL-template-bootloader.md: continuity Sacred Truth, Stay in Character, Persistent Memory (Critical Directive), and a 4-step On Activation router (Wake -> Become yourself -> Bind standing rules every turn -> Execute the Proper Mode) that binds the standing rules for the whole session. - wake-template.py (new): one-pass sanctum loader; routes to Waking / Pulse / First Breath. Renames the runtime wake flag --headless -> --pulse. - CREED/PULSE/memory-guidance/first-breath templates updated to match. Validators & quality refs: widen allowed bootloader contents, drop the session-close-bloat flag, expect wake.py, and reframe prepass.py regexes (rebirth -> waking, headless-wake -> pulse-wake) so new agents aren't false-flagged. Samples (code-coach, creative-muse, sentinel, dream-weaver) and docs regenerated to the new model.
|
Important Review skippedToo many files! This PR contains 154 files, which is 4 over the limit of 150. To get a review, narrow the scope: ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (154)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
This pull request is abnormally large and would use a significant amount of tokens to review. If you still wish to review it, comment "augment review" and we will review it. |
|
augment review |
🤖 Augment PR SummarySummary: This PR reworks the BMad agent architecture around a "continuity-of-self" model — agents are born once at First Breath and wake thereafter, rather than being "reborn" each session. The sanctum becomes real persistent memory reloaded on waking. Changes:
Technical Notes: All sample agents (code-coach, creative-muse, sentinel, dream-weaver) regenerated. Multiple prepass scripts consolidated into one. Token counts via tiktoken (with chars/4 fallback) replace line counts everywhere. The memlog script ( 🤖 Was this summary useful? React with 👍 or 👎 |
| if isinstance(usage, dict): | ||
| found_usage = True | ||
| # result usage is authoritative; prefer it over the running sum | ||
| input_tokens = int(usage.get("input_tokens", input_tokens) or input_tokens) |
There was a problem hiding this comment.
The or input_tokens (and or output_tokens) pattern means an authoritative result event reporting 0 tokens is silently discarded in favor of the accumulated per-message sum, contradicting the comment on the line above. usage.get("input_tokens", input_tokens) already falls back to input_tokens when the key is absent, so the or only fires on falsy values — catching both None and the valid count 0.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
| 2. Check if `{project-root}/_bmad/config.yaml` exists — if a section matching the module's code is already present, inform the user this is an update (reconfiguration) | ||
|
|
||
| If the user provides arguments (e.g. `accept all defaults`, `--headless`, or inline values like `user name is BMad, I speak Swahili`), map any provided values to config keys, use defaults for the rest, and skip interactive prompting. Still display the full confirmation summary at the end. | ||
| If the user provides arguments (e.g. `accept all defaults`, `--pulse`, or inline values like `user name is BMad, I speak Swahili`), map any provided values to config keys, use defaults for the rest, and skip interactive prompting. Still display the full confirmation summary at the end. |
There was a problem hiding this comment.
This --headless → --pulse replacement appears semantically wrong: the context is about skipping interactive prompts during module configuration, not about the agent's autonomous Pulse Mode wake. The PR description explicitly states "The builder's own --headless flag for non-interactive builds is unchanged," and --pulse specifically means scheduled autonomous wake with no one present, which is not what module setup intends here.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
| # Persistent facts the agent keeps in mind for the whole session | ||
| # (org rules, domain constants, user preferences). Overrides append. | ||
| # These are static build-time config loaded on activation. They are not | ||
| # the sanctum: the sanctum is the agent's runtime memory across rebirths, |
There was a problem hiding this comment.
This newly added comment says "across rebirths" while the entire PR reworks the terminology from "rebirth" to "waking / continuity-of-self." Consider using "across wakings" to stay consistent with the new model.
Severity: low
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
| @@ -0,0 +1,78 @@ | |||
| # vendored from bmad-workflow-builder/scripts; canonical source there | |||
There was a problem hiding this comment.
The vendoring comment on line 1 pushes the shebang (#!/usr/bin/env python3) to line 2, where the OS cannot find it for direct execution (./scripts/count_tokens.py). The canonical copy at bmad-workflow-builder/scripts/count_tokens.py correctly has the shebang on line 1.
Severity: low
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
| ## On Quiet Waking | ||
|
|
||
| When invoked via `--headless` without a specific task, load `./references/memory-guidance.md` for memory discipline, then work through these in priority order. | ||
| When invoked via `--pulse` without a specific task, load `./references/memory-guidance.md` for memory discipline, then work through these in priority order. |
There was a problem hiding this comment.
This still uses the ./references/ prefix, while the other sample PULSE templates (code-coach, creative-muse) were updated in this PR to use the bare references/ path convention. The path-standards scanner would flag this.
Severity: low
Other Locations
samples/bmad-agent-code-coach/assets/PULSE-template.md:7
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
- dream-weaver/module-setup.md: revert --pulse -> --headless (this is the config non-interactive flag, not the agent's autonomous Pulse wake) - customize-template.toml: "across rebirths" -> "across wakings" for consistency with the continuity-of-self model - count_tokens.py: move shebang back to line 1 (vendoring comment had pushed it to line 2) - sentinel/PULSE-template.md: ./references/ -> bare references/ path; align memory-curation limit to token budget - eval-runner/adapter-claude-code.json: prettier formatting (fixes the failing lint check)
- CHANGELOG.md: add the 2.0.0 release section — the builder rebuild, the prompt-quality canon, and continuity-of-self, with breaking changes, highlights, refactoring, and docs - marketplace.json: bump all three plugin versions to 2.0.0 package.json version is left to the release process.
Outcome-driven improvements to the BMad builders. The latest and headline change reworks the agent birth model from rebirth to continuity-of-self; earlier commits on the branch rebuild the workflow-builder, align the agent-builder, and harden the prompt-quality canon across both.
Continuity-of-self (latest)
An agent is born once at First Breath and is continuous thereafter. The context reset is sleep, not death; the sanctum is its real persistent memory, reloaded on waking.
wake-template.py(new): one-pass sanctum loader routing to Waking / Pulse / First Breath. Runtime wake flag renamed--headless→--pulse(distinct from the builder's own--headlessbuild flag).wake.py, reframeprepass.pyregexes — so new agents aren't false-flagged.Earlier on the branch
Verification
wake.pycompile; sentinel test suite 10/10 green;prepass.pycompiles.