Skip to content

design: docs automation#804

Open
notowen333 wants to merge 1 commit intostrands-agents:mainfrom
notowen333:docs-automations
Open

design: docs automation#804
notowen333 wants to merge 1 commit intostrands-agents:mainfrom
notowen333:docs-automations

Conversation

@notowen333
Copy link
Copy Markdown
Contributor

@notowen333 notowen333 commented May 1, 2026

Description

Documentation Automations. Docs agent and docs audit agent.

Reference for source code with agent implementations: https://github.com/notowen333/strands-docs-agents

Type of Change

  • Other (please describe):


In this proposal, we split the problem space into two distinct domains with corresponding workflows:

1. source change → docs. Event based. A developer has merged a diff and the docs need to reflect it.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need event driven, would a cron job do?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

Documentation Preview Ready

Your documentation preview has been successfully deployed!

Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-cms-804/docs/user-guide/quickstart/overview/

Updated at: 2026-05-01T17:08:28.014Z

@@ -0,0 +1,365 @@
# Strands Documentation Agent/s Review
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In short, these tools allow the main agent to spawn individual Agents, but don't feel like a purpose-built orchestrator-worker
protocol/abstraction.

Upcoming async/background tools would be a necessary piece to provide such constructs. With background agents-as-tools, a construct
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) coming soon


Just a single Agent alone with the same tool set also had a high baseline latency of ~5-12 minutes.

In terms of signing off on the docs agent, perhaps the current 7-15 minutes is acceptable. In any
Copy link
Copy Markdown
Member

@lizradway lizradway May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is 100% acceptable to me, imho anything with no impact mcm release is acceptable. scale of days, not minutes


When I started this work I naively assumed that a documentation automation was going to be really simple in its implementation
and the open questions were going to center around distribution and runtime choices. This did not turn out to be the case. Balancing
correctness and latency has turned out to be really tricky. At the time of writing, the implementation does not solve for latency.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How important is latency for this use case, given the existing latency to make doc updates manually?


Since releasing TypeScript side-by-side with Python in the docs, the workflow for creating
effective documentation has become more complicated. TypeScript code samples go in `.ts` snippets
while Python gets inlined in markdown. `<Tabs>` blocks present each language's flavor of a feature,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tangential: let's get rid of tabs. They don't add anything, they make the page crowded, they complicate docs writing, and cause structural problems.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ryan is (as am I) aligned on this, esp now that we have the language picker.

We're ultimately thinking that we'd have entirely different pages per-language


When I started this work I naively assumed that a documentation automation was going to be really simple in its implementation
and the open questions were going to center around distribution and runtime choices. This did not turn out to be the case. Balancing
correctness and latency has turned out to be really tricky. At the time of writing, the implementation does not solve for latency.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Balancing
correctness and latency has turned out to be really tricky

do we need to solve for latency? Do we care? It's not a sync workflow, so as far as I care, it can take a day

Across all methods attempted the comprehensiveness of the explore phase was the most important factor in effectiveness.
There are two considerations we might take for useful dedicated vended tools:

1. A grep tool (find instances within files)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just a shell tool with sandbox? (i guess we dont have sandboxes yet :P ) but the overall point stands


## Proposed Docs Agent Pipeline

To handle the different contexts where the docs agent needs to run (pr, issue, revision) we add contextualize skills as the first step in the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the docs agent need to run in different context? why not just run it at 9am each day, and run it on the PRs that were merged yesterday?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design of the agent is 1:1 with the PR. It's true that we could set up something similar to a daily run batching over merged PRs. That would generate the same runs.

What we'd lose with that approach is the context window of the dev. They wouldn't be able to merge a change -> see the generated PR -> review it and approve or give comments -> and close the issue out.

So it's more about aligning with the human in the loop for why we would want to kick off runs live at merge.


### Limitations: Latency and Cost

While experimenting, I optimized for correctness which was unfortunately paired with high latency between 10–20 minutes for
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a graph specific thing, or did it apply to everything?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The graph ran 20+ minutes, with 20 being the average and the bottom. Alternatives were able to get the minimum and average run times significantly reduced.

latency for docs generation. When comparing to other code generation tools like CC/Codex, it's troubling that
the role style design experimented with was so latent.

Just a single Agent alone with the same tool set also had a high baseline latency of ~5-12 minutes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what did it do in that 5-12 minutes? what's the latency breakdown?

With the understanding that we're expecting to iterate, building out the proposed docs agent and audit agent would have rough edges, but should
deliver immediate value.

If we align on moving forward with implementation, the main open question is whether to start with the reusable `/strands docs` runner now, or wait for the monorepo to avoid short-lived wiring.
Copy link
Copy Markdown
Member

@lizradway lizradway May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my assumption is that we will need a human-led docs overhaul for the v2a releases, so i am pro-waiting for monorepo for a concrete implementation


So, for now starting with the re-usable `/strands docs` runner is a no-regret choice.

### Why not run this locally with a SKILL.md
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eww. I want async, why do I need to bother asking an agent to write docs? it should just happen

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, I'm not against a shared skill, but we shouldn't limit there

INPUT TYPE: issue → {contextualize-issue} skill
INPUT TYPE: revision → {contextualize-comments} skill + S3SessionManager
Copy link
Copy Markdown
Contributor

@gautamsirdeshmukh gautamsirdeshmukh May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could we leverage After Tool Call / After Invocation event hooks here? I'm specifically thinking about the doc-writer -> [refiner, validator, audit] -> doc-writer loop at the heart of this. I imagine this could help enforce some structured schema / templates per section of the docs repo a bit more deterministically and efficiently than putting everything on the refiner/validator/auditor.

Copy link
Copy Markdown
Contributor

@gautamsirdeshmukh gautamsirdeshmukh May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR constraining the doc writer output with deterministic guardrails/templates instead of the process being FULLY agent-driven

Copy link
Copy Markdown
Member

@lizradway lizradway May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could we leverage After Tool Call event hooks here?

what do you mean by this?/what are you visualizing? @gautamsirdeshmukh

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice idea. I'd need to think more about the exact logic we'd want on that determinism, but it's definitely an avenue to increase speed. Maybe something like the npm run commands could be levered

In this proposal, we split the problem space into two distinct domains with corresponding workflows:

1. source change → docs. Event based. A developer has merged a diff and the docs need to reflect it.
2. docs -> source as SOT. Proactive sentinel. As a cron-style async job, the docs-auditor agent checks back-and-forth between the state of the docs and the source code. Inconsistencies are raised as issues and then patched by invoking the docs agent from (1)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a cron-style async job

I like this idea in general, provided the quality is high enough; just using the idea of "randomly look at something and check if it's in sync' would probably cache drift/issues

### Limitations: Latency and Cost

While experimenting, I optimized for correctness which was unfortunately paired with high latency between 10–20 minutes for
medium to large diffs. Similarly, token inputs grew very quickly towards 1-5M input tokens per run.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our planned work for an improved context management system will reduce the number of tokens used. Archival/long term memory (which we are currently thinking of) for a docs agent could both reduce latency and increase quality since it would allow the agent to do semantic search on both the code base and the docs


I first reached for a graph because it fit my mental model of the necessary flow.

explore -> doc_writer -> refiner -> validator -> language_parity -> ui_tester
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that in isolated runtime/ sandbox? the main idea is to achieve better validation/eval result. Agent should not do self review.

👀 → neutral / needs review
```

We can also apply the same approach to our existing `/impl` and `/review` workflows. The biggest downside of the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen some folks do it, how would we collect the data and improve the system? just ask kiro/cc/codex/whatever?


By setting up a GH action workflow, we can automatically run the docs agent on PR merge. We can re-use existing work like the tools and utilities defined in `devtools`.

## Experimentations and Learnings
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the doc agent handle conflicts? Two PRs might change the same pages and the agent has to deal with that, but I believe it is possible?

Also, I'm expecting the author to approve, and the external contributor's approval wouldn't count, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automatically, each PR would generate its own run.

When there's dependent interplay and those PRs are irreconcilable, a fresh issue citing both PRs could be raised. From what I can tell this would be an uncommon case.

And yes we'd need a maintainer approval on every run like usual

─ ─ ─ = findings from audit/ui-tester re-enter doc-writer, which re-runs
```

### Limitations: Latency and Cost
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should consider more about accuracy and human readable, easier understandable, user-friendly content?


In our GitHub environment a feedback comment automation would be a simple solution.

Using the available GitHub emojis:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this as a feedback mechanism!


### Limitations: Latency and Cost

While experimenting, I optimized for correctness which was unfortunately paired with high latency between 10–20 minutes for
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For snippets and examples we present in the docs, I would like the agent to run testing. That is currently what I have done. It adds to the latency of course but it has been worth it.


Approaches like these could be attempted by using the `use_agent` tool in the community tools repo or alternatively by using the
experimental agent-as-config approach. However, both options are not ideal for a clean orchestrator–worker implementation:
they only support sequential, blocking invocations.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this related to agent teams? what's the blocking sequential limitation? technically LLM can call multiple tools at once, we'd need to batch the ends (but that's also what graph does today)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gautamsirdeshmukh can answer for Agent teams.

I think in this section, I'm trying to note that we have tools that feel adjacent but not a perfect fit for orchestrator-worker pool design. Sequential is workable but it's not ideal


Rate `/strands docs agent` output on this PR:

👍 → good / acceptable
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WE can/should just start collecting all feedback on our agent comments as a way to improve; our existing agents could use the same data TBH


### Coordinating Concurrent Agents

A small tool-as-class `SharedLedger` can be used to accommodate many Agents interacting with the same file. Each Agent has a `write_ledger`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who are these conflicting agents? why do we have multiple agents trying to write to same filesystem project at the same time? why not split those?


We'd lose two things by taking that choice:

1) The opportunity to dogfood concurrent agent coordination tooling
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could dogfood new context stuff and sandboxes :)

Comment on lines +304 to +305
The docs audit agent has the potential to be very annoying. If it flags issues which are not definitive, issues which were already flagged,
or produces any other erroneous output, we will be tempted to turn it off.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider the agent having 1 long-running ticket that it always reports to? It can be part of our oncall to check it once a week and approve or discard suggested updates


## Recommendation

Start with a reusable `/strands docs` GitHub Actions runner using the proposed main-agent + fresh audit-subagent pipeline. Treat the current latency as acceptable for initial dogfooding, but track it explicitly. Defer fully automatic PR-merge kickoff until the monorepo removes cross-repo permission issues.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should get to monorepo relatively soon

With the understanding that we're expecting to iterate, building out the proposed docs agent and audit agent would have rough edges, but should
deliver immediate value.

If we align on moving forward with implementation, the main open question is whether to start with the reusable `/strands docs` runner now, or wait for the monorepo to avoid short-lived wiring.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would be the short-lived wiring? don't we add some stuff to devtools/strands-command anyways? what would need to change, invocations?


Either way, the experiment surfaced useful Strands follow-up areas: file exploration tools, fresh-context audit patterns, workflow feedback collection, and multi-agent coordination.

We also might look to convert some of the learnings around "how do I model my multi-step workflow in Strands" into a page in our docs.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a blog post? :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the docs agent can propose blog posts based on newly merged features?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uff I love this


## Background

Writing documentation for Strands features and capabilities is the important final step
Copy link
Copy Markdown
Member

@lizradway lizradway May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this might need more of a product alignment decision required here, but if we are wanting to automate docs changes 100% (or honestly, even if we continue manually), we should have some sort of common structure / ethos / guidance / tenets for our docs site content. i would be interested in seeing a proposal for that in the future.

right now, we developers are following the model of "adding whatever we happen to think is needed to understand the feature" (the bar is really low for shipping internal and community docs changes imo), but i think some sort of aligned requirements for documentation should be introduced in the v2 docs overhaul, which this agent can then abide by in its autonomous execution.

our docs site is already highly scattered and has scaled in cluttered and non-uniform ways. information is buried under layers and layers of headers. i think automating docs changes is a must for strands to scale, but i worry that if we do not have some structural integrity/tenets behind these changes, we'll eventually just end up with docs slop.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the need for common structure / ethos / guidance / tenets guidelines. But I'd strongly push back against 100% anything.

v2 docs overhaul, which this agent can then abide by in its autonomous execution.

I'm also guessing we're not going to have any big v2 overhaul at this point; everything is going to be incremental AFAICT so we should start documenting this now with that in mind.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, @ryanycoleman is going to be working towards adding some pre-defined skills/guidelines that he's been developing to the docs site

Since we're already working in GitHub and have existing GH Actions devtools, we can follow the same
pattern as `/strands impl` and `/strands review` and use GitHub Action runners.

By setting up a GH action workflow, we can automatically run the docs agent on PR merge. We can re-use existing work like the tools and utilities defined in `devtools`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Triggering the agent on merge makes sense from the standpoint of streamlining our development efforts, but it does put the decision of "does this change require a doc update" in the hands of the agent instead of the human (i.e. making the trigger the /strands docs command). This isn't a breaking issue, but would be a cause of extra churn if that first decision step isn't nailed down.

─ ─ ─ = findings from audit/ui-tester re-enter doc-writer, which re-runs
```

### Limitations: Latency and Cost
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we had an agent develop doc templates across the pages? Going forward then the docs agent would fill in these templates which might speed up delivery. It could potentially help reduce context and ensure better consistency.

Generally speaking here, maybe there is some preliminary scaffolding we need to setup to get our docs agents running more effectively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants