design: docs automation#804
Conversation
|
|
||
| In this proposal, we split the problem space into two distinct domains with corresponding workflows: | ||
|
|
||
| 1. source change → docs. Event based. A developer has merged a diff and the docs need to reflect it. |
There was a problem hiding this comment.
do we need event driven, would a cron job do?
Documentation Preview ReadyYour documentation preview has been successfully deployed! Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-cms-804/docs/user-guide/quickstart/overview/ Updated at: 2026-05-01T17:08:28.014Z |
| @@ -0,0 +1,365 @@ | |||
| # Strands Documentation Agent/s Review | |||
| In short, these tools allow the main agent to spawn individual Agents, but don't feel like a purpose-built orchestrator-worker | ||
| protocol/abstraction. | ||
|
|
||
| Upcoming async/background tools would be a necessary piece to provide such constructs. With background agents-as-tools, a construct |
|
|
||
| Just a single Agent alone with the same tool set also had a high baseline latency of ~5-12 minutes. | ||
|
|
||
| In terms of signing off on the docs agent, perhaps the current 7-15 minutes is acceptable. In any |
There was a problem hiding this comment.
this is 100% acceptable to me, imho anything with no impact mcm release is acceptable. scale of days, not minutes
|
|
||
| When I started this work I naively assumed that a documentation automation was going to be really simple in its implementation | ||
| and the open questions were going to center around distribution and runtime choices. This did not turn out to be the case. Balancing | ||
| correctness and latency has turned out to be really tricky. At the time of writing, the implementation does not solve for latency. |
There was a problem hiding this comment.
How important is latency for this use case, given the existing latency to make doc updates manually?
|
|
||
| Since releasing TypeScript side-by-side with Python in the docs, the workflow for creating | ||
| effective documentation has become more complicated. TypeScript code samples go in `.ts` snippets | ||
| while Python gets inlined in markdown. `<Tabs>` blocks present each language's flavor of a feature, |
There was a problem hiding this comment.
tangential: let's get rid of tabs. They don't add anything, they make the page crowded, they complicate docs writing, and cause structural problems.
There was a problem hiding this comment.
Ryan is (as am I) aligned on this, esp now that we have the language picker.
We're ultimately thinking that we'd have entirely different pages per-language
|
|
||
| When I started this work I naively assumed that a documentation automation was going to be really simple in its implementation | ||
| and the open questions were going to center around distribution and runtime choices. This did not turn out to be the case. Balancing | ||
| correctness and latency has turned out to be really tricky. At the time of writing, the implementation does not solve for latency. |
There was a problem hiding this comment.
Balancing
correctness and latency has turned out to be really tricky
do we need to solve for latency? Do we care? It's not a sync workflow, so as far as I care, it can take a day
| Across all methods attempted the comprehensiveness of the explore phase was the most important factor in effectiveness. | ||
| There are two considerations we might take for useful dedicated vended tools: | ||
|
|
||
| 1. A grep tool (find instances within files) |
There was a problem hiding this comment.
why not just a shell tool with sandbox? (i guess we dont have sandboxes yet :P ) but the overall point stands
|
|
||
| ## Proposed Docs Agent Pipeline | ||
|
|
||
| To handle the different contexts where the docs agent needs to run (pr, issue, revision) we add contextualize skills as the first step in the |
There was a problem hiding this comment.
does the docs agent need to run in different context? why not just run it at 9am each day, and run it on the PRs that were merged yesterday?
There was a problem hiding this comment.
The design of the agent is 1:1 with the PR. It's true that we could set up something similar to a daily run batching over merged PRs. That would generate the same runs.
What we'd lose with that approach is the context window of the dev. They wouldn't be able to merge a change -> see the generated PR -> review it and approve or give comments -> and close the issue out.
So it's more about aligning with the human in the loop for why we would want to kick off runs live at merge.
|
|
||
| ### Limitations: Latency and Cost | ||
|
|
||
| While experimenting, I optimized for correctness which was unfortunately paired with high latency between 10–20 minutes for |
There was a problem hiding this comment.
is this a graph specific thing, or did it apply to everything?
There was a problem hiding this comment.
The graph ran 20+ minutes, with 20 being the average and the bottom. Alternatives were able to get the minimum and average run times significantly reduced.
| latency for docs generation. When comparing to other code generation tools like CC/Codex, it's troubling that | ||
| the role style design experimented with was so latent. | ||
|
|
||
| Just a single Agent alone with the same tool set also had a high baseline latency of ~5-12 minutes. |
There was a problem hiding this comment.
what did it do in that 5-12 minutes? what's the latency breakdown?
| With the understanding that we're expecting to iterate, building out the proposed docs agent and audit agent would have rough edges, but should | ||
| deliver immediate value. | ||
|
|
||
| If we align on moving forward with implementation, the main open question is whether to start with the reusable `/strands docs` runner now, or wait for the monorepo to avoid short-lived wiring. |
There was a problem hiding this comment.
my assumption is that we will need a human-led docs overhaul for the v2a releases, so i am pro-waiting for monorepo for a concrete implementation
|
|
||
| So, for now starting with the re-usable `/strands docs` runner is a no-regret choice. | ||
|
|
||
| ### Why not run this locally with a SKILL.md |
There was a problem hiding this comment.
eww. I want async, why do I need to bother asking an agent to write docs? it should just happen
There was a problem hiding this comment.
I mean, I'm not against a shared skill, but we shouldn't limit there
| INPUT TYPE: issue → {contextualize-issue} skill | ||
| INPUT TYPE: revision → {contextualize-comments} skill + S3SessionManager | ||
| │ | ||
| ▼ |
There was a problem hiding this comment.
How could we leverage After Tool Call / After Invocation event hooks here? I'm specifically thinking about the doc-writer -> [refiner, validator, audit] -> doc-writer loop at the heart of this. I imagine this could help enforce some structured schema / templates per section of the docs repo a bit more deterministically and efficiently than putting everything on the refiner/validator/auditor.
There was a problem hiding this comment.
TLDR constraining the doc writer output with deterministic guardrails/templates instead of the process being FULLY agent-driven
There was a problem hiding this comment.
How could we leverage After Tool Call event hooks here?
what do you mean by this?/what are you visualizing? @gautamsirdeshmukh
There was a problem hiding this comment.
That's a nice idea. I'd need to think more about the exact logic we'd want on that determinism, but it's definitely an avenue to increase speed. Maybe something like the npm run commands could be levered
| In this proposal, we split the problem space into two distinct domains with corresponding workflows: | ||
|
|
||
| 1. source change → docs. Event based. A developer has merged a diff and the docs need to reflect it. | ||
| 2. docs -> source as SOT. Proactive sentinel. As a cron-style async job, the docs-auditor agent checks back-and-forth between the state of the docs and the source code. Inconsistencies are raised as issues and then patched by invoking the docs agent from (1) |
There was a problem hiding this comment.
As a cron-style async job
I like this idea in general, provided the quality is high enough; just using the idea of "randomly look at something and check if it's in sync' would probably cache drift/issues
| ### Limitations: Latency and Cost | ||
|
|
||
| While experimenting, I optimized for correctness which was unfortunately paired with high latency between 10–20 minutes for | ||
| medium to large diffs. Similarly, token inputs grew very quickly towards 1-5M input tokens per run. |
There was a problem hiding this comment.
Our planned work for an improved context management system will reduce the number of tokens used. Archival/long term memory (which we are currently thinking of) for a docs agent could both reduce latency and increase quality since it would allow the agent to do semantic search on both the code base and the docs
|
|
||
| I first reached for a graph because it fit my mental model of the necessary flow. | ||
|
|
||
| explore -> doc_writer -> refiner -> validator -> language_parity -> ui_tester |
There was a problem hiding this comment.
is that in isolated runtime/ sandbox? the main idea is to achieve better validation/eval result. Agent should not do self review.
| 👀 → neutral / needs review | ||
| ``` | ||
|
|
||
| We can also apply the same approach to our existing `/impl` and `/review` workflows. The biggest downside of the |
There was a problem hiding this comment.
I've seen some folks do it, how would we collect the data and improve the system? just ask kiro/cc/codex/whatever?
|
|
||
| By setting up a GH action workflow, we can automatically run the docs agent on PR merge. We can re-use existing work like the tools and utilities defined in `devtools`. | ||
|
|
||
| ## Experimentations and Learnings |
There was a problem hiding this comment.
Can the doc agent handle conflicts? Two PRs might change the same pages and the agent has to deal with that, but I believe it is possible?
Also, I'm expecting the author to approve, and the external contributor's approval wouldn't count, right?
There was a problem hiding this comment.
Automatically, each PR would generate its own run.
When there's dependent interplay and those PRs are irreconcilable, a fresh issue citing both PRs could be raised. From what I can tell this would be an uncommon case.
And yes we'd need a maintainer approval on every run like usual
| ─ ─ ─ = findings from audit/ui-tester re-enter doc-writer, which re-runs | ||
| ``` | ||
|
|
||
| ### Limitations: Latency and Cost |
There was a problem hiding this comment.
I guess we should consider more about accuracy and human readable, easier understandable, user-friendly content?
|
|
||
| In our GitHub environment a feedback comment automation would be a simple solution. | ||
|
|
||
| Using the available GitHub emojis: |
There was a problem hiding this comment.
I like this as a feedback mechanism!
|
|
||
| ### Limitations: Latency and Cost | ||
|
|
||
| While experimenting, I optimized for correctness which was unfortunately paired with high latency between 10–20 minutes for |
There was a problem hiding this comment.
For snippets and examples we present in the docs, I would like the agent to run testing. That is currently what I have done. It adds to the latency of course but it has been worth it.
|
|
||
| Approaches like these could be attempted by using the `use_agent` tool in the community tools repo or alternatively by using the | ||
| experimental agent-as-config approach. However, both options are not ideal for a clean orchestrator–worker implementation: | ||
| they only support sequential, blocking invocations. |
There was a problem hiding this comment.
is this related to agent teams? what's the blocking sequential limitation? technically LLM can call multiple tools at once, we'd need to batch the ends (but that's also what graph does today)
There was a problem hiding this comment.
@gautamsirdeshmukh can answer for Agent teams.
I think in this section, I'm trying to note that we have tools that feel adjacent but not a perfect fit for orchestrator-worker pool design. Sequential is workable but it's not ideal
|
|
||
| Rate `/strands docs agent` output on this PR: | ||
|
|
||
| 👍 → good / acceptable |
There was a problem hiding this comment.
WE can/should just start collecting all feedback on our agent comments as a way to improve; our existing agents could use the same data TBH
|
|
||
| ### Coordinating Concurrent Agents | ||
|
|
||
| A small tool-as-class `SharedLedger` can be used to accommodate many Agents interacting with the same file. Each Agent has a `write_ledger` |
There was a problem hiding this comment.
who are these conflicting agents? why do we have multiple agents trying to write to same filesystem project at the same time? why not split those?
|
|
||
| We'd lose two things by taking that choice: | ||
|
|
||
| 1) The opportunity to dogfood concurrent agent coordination tooling |
There was a problem hiding this comment.
we could dogfood new context stuff and sandboxes :)
| The docs audit agent has the potential to be very annoying. If it flags issues which are not definitive, issues which were already flagged, | ||
| or produces any other erroneous output, we will be tempted to turn it off. |
There was a problem hiding this comment.
Did you consider the agent having 1 long-running ticket that it always reports to? It can be part of our oncall to check it once a week and approve or discard suggested updates
|
|
||
| ## Recommendation | ||
|
|
||
| Start with a reusable `/strands docs` GitHub Actions runner using the proposed main-agent + fresh audit-subagent pipeline. Treat the current latency as acceptable for initial dogfooding, but track it explicitly. Defer fully automatic PR-merge kickoff until the monorepo removes cross-repo permission issues. |
There was a problem hiding this comment.
we should get to monorepo relatively soon
| With the understanding that we're expecting to iterate, building out the proposed docs agent and audit agent would have rough edges, but should | ||
| deliver immediate value. | ||
|
|
||
| If we align on moving forward with implementation, the main open question is whether to start with the reusable `/strands docs` runner now, or wait for the monorepo to avoid short-lived wiring. |
There was a problem hiding this comment.
what would be the short-lived wiring? don't we add some stuff to devtools/strands-command anyways? what would need to change, invocations?
|
|
||
| Either way, the experiment surfaced useful Strands follow-up areas: file exploration tools, fresh-context audit patterns, workflow feedback collection, and multi-agent coordination. | ||
|
|
||
| We also might look to convert some of the learnings around "how do I model my multi-step workflow in Strands" into a page in our docs. |
There was a problem hiding this comment.
What if the docs agent can propose blog posts based on newly merged features?
|
|
||
| ## Background | ||
|
|
||
| Writing documentation for Strands features and capabilities is the important final step |
There was a problem hiding this comment.
i think this might need more of a product alignment decision required here, but if we are wanting to automate docs changes 100% (or honestly, even if we continue manually), we should have some sort of common structure / ethos / guidance / tenets for our docs site content. i would be interested in seeing a proposal for that in the future.
right now, we developers are following the model of "adding whatever we happen to think is needed to understand the feature" (the bar is really low for shipping internal and community docs changes imo), but i think some sort of aligned requirements for documentation should be introduced in the v2 docs overhaul, which this agent can then abide by in its autonomous execution.
our docs site is already highly scattered and has scaled in cluttered and non-uniform ways. information is buried under layers and layers of headers. i think automating docs changes is a must for strands to scale, but i worry that if we do not have some structural integrity/tenets behind these changes, we'll eventually just end up with docs slop.
There was a problem hiding this comment.
Agreed on the need for common structure / ethos / guidance / tenets guidelines. But I'd strongly push back against 100% anything.
v2 docs overhaul, which this agent can then abide by in its autonomous execution.
I'm also guessing we're not going to have any big v2 overhaul at this point; everything is going to be incremental AFAICT so we should start documenting this now with that in mind.
There was a problem hiding this comment.
That said, @ryanycoleman is going to be working towards adding some pre-defined skills/guidelines that he's been developing to the docs site
| Since we're already working in GitHub and have existing GH Actions devtools, we can follow the same | ||
| pattern as `/strands impl` and `/strands review` and use GitHub Action runners. | ||
|
|
||
| By setting up a GH action workflow, we can automatically run the docs agent on PR merge. We can re-use existing work like the tools and utilities defined in `devtools`. |
There was a problem hiding this comment.
Triggering the agent on merge makes sense from the standpoint of streamlining our development efforts, but it does put the decision of "does this change require a doc update" in the hands of the agent instead of the human (i.e. making the trigger the /strands docs command). This isn't a breaking issue, but would be a cause of extra churn if that first decision step isn't nailed down.
| ─ ─ ─ = findings from audit/ui-tester re-enter doc-writer, which re-runs | ||
| ``` | ||
|
|
||
| ### Limitations: Latency and Cost |
There was a problem hiding this comment.
What if we had an agent develop doc templates across the pages? Going forward then the docs agent would fill in these templates which might speed up delivery. It could potentially help reduce context and ensure better consistency.
Generally speaking here, maybe there is some preliminary scaffolding we need to setup to get our docs agents running more effectively.
Description
Documentation Automations. Docs agent and docs audit agent.
Reference for source code with agent implementations: https://github.com/notowen333/strands-docs-agents
Type of Change