Skip to content

feat: add codebase-documentor plugin#138

Open
XinyuQu wants to merge 1 commit intoawslabs:mainfrom
XinyuQu:feat/codebase-documentor
Open

feat: add codebase-documentor plugin#138
XinyuQu wants to merge 1 commit intoawslabs:mainfrom
XinyuQu:feat/codebase-documentor

Conversation

@XinyuQu
Copy link
Copy Markdown
Contributor

@XinyuQu XinyuQu commented Apr 17, 2026

RFC: #79

Summary

Add codebase-documentor plugin — deep codebase analysis that produces a single CODEBASE_ANALYSIS.md with source-of-truth citations.

This plugin addresses two growing problems identified in the RFC: tribal knowledge loss when engineers leave teams, and the documentation gap created by AI-assisted coding where thousands of lines are generated faster than teams can document them. Engineers inherit codebases where original authors are unavailable, design decisions exist only in someone's head, and AI-generated code works but nobody documented why it's structured that way. The gap between code production speed and documentation speed is widening.

The plugin produces structured, verifiable documentation — not one-time chat responses. Every finding links back to the specific file and line it was derived from, so readers can verify claims and identify stale documentation when code changes. It uses an iterative deepening approach (scan → question → search → write) rather than a single-pass skim, and is designed to run for extended time to produce deep analysis. The output goes significantly beyond what a naive "explain this code" prompt produces: it traces end-to-end request flows, detects discrepancies between documentation and actual code, documents failure modes with recovery commands for oncall engineers, and flags implicit knowledge (hardcoded values, magic numbers, undocumented assumptions) that would otherwise disappear when teams rotate.

While the plugin works with any codebase, it is optimized for AWS-deployed services. It parses CDK constructs, CloudFormation resources, and Terraform blocks as first-class application code — recognizing that in CDK, the infrastructure IS the application logic. It consults awsknowledge and awsiac MCP servers for AWS service enrichment and IaC validation, and integrates with the aws-architecture-diagram skill (deploy-on-aws plugin) to produce validated draw.io diagrams with official AWS4 icons. Failure modes include AWS-specific detection methods and recovery commands. The plugin is tool-agnostic and works on Claude Code, Cursor, Codex, and other coding assistants.

What's included

Plugin infrastructure:

  • Plugin manifest (.claude-plugin/plugin.json) and MCP server config (.mcp.json)
  • Codex marketplace entry and Codex plugin manifest (.codex-plugin/plugin.json)
  • CODEOWNERS entry and root README listing

Skill — document-service:

  • Outline-driven pipeline: file tree → outline → iterative 3-pass analysis → assembly
  • Clickable citations: every finding links to source code via markdown [file:line](./file#Lline) links
  • Discrepancy detection: cross-references README/metadata claims vs actual code
  • Actionable failure modes: detection methods + recovery commands for oncall engineers
  • Architecture diagrams: delegates to aws-architecture-diagram skill (deploy-on-aws plugin) for draw.io output; Mermaid fallback for flow diagrams and architecture overview
  • Large codebase support: tracked sequential analysis with resumable progress file; optional parallel workers when environment supports them

Output sections: Architecture Overview, Code Analysis, Request Lifecycle, Domain Logic Deep-Dive, Startup & Initialization, Components, API Contracts, Data Models, Deployment, Configuration, Monitoring & Observability, Security, Local Development, Discrepancies, Failure Modes, Timeout/Dependency Chain, Runbook Hints, Business Context.

MCP servers:

  • awsknowledge (HTTP) — AWS service descriptions, architecture guidance
  • awsiac (stdio) — CDK/CloudFormation resource schema validation

Changes

  • Plugin manifest (.claude-plugin/plugin.json): metadata, keywords, Apache-2.0 license
  • MCP config (.mcp.json): awsknowledge (HTTP) + awsiac (stdio/uvx)
  • Skill (skills/document-service/SKILL.md): 6-step autonomous workflow with iterative deepening
  • 8 reference files: progressive disclosure for citation format, project detection, code extraction patterns, exclusion patterns, templates, error scenarios, and large codebase strategy
  • Marketplace entries in .claude-plugin/marketplace.json and .agents/plugins/marketplace.json
  • Codex manifest in .codex-plugin/plugin.json and .agents/plugins/marketplace.json
  • CODEOWNERS entry for plugins/codebase-documentor
  • README.md table entry, install command, and detailed plugin section

Evaluation

Tested blind against aws-samples/sample-deepseek-ocr-selfhost — a CDK TypeScript + Python project with 6 CDK stacks, ECS GPU inference, Lambda processing, and API Gateway. The README was removed before analysis to simulate a legacy handoff.

The plugin produced a 571-line CODEBASE_ANALYSIS.md with a draw.io architecture diagram that:

  • Found 15 discrepancies between CLAUDE.md/package.json claims and actual code (including phantom A2I/StepFunctions/DynamoDB dependencies that were declared but never implemented)
  • Traced 2 end-to-end request lifecycles with Mermaid sequence diagrams
  • Generated a draw.io architecture diagram with 11 AWS services using official AWS4 icons
  • Documented 11 failure modes with AWS-specific detection and recovery commands
  • Identified a critical timeout mismatch (29s API Gateway vs multi-minute OCR inference)

Sample output (analysis report + draw.io diagram + SVG render): https://gist.github.com/XinyuQu/2001dff63cc5c5ab12c2f0eb1ea2a78a

Test plan

  • Trigger skill by asking to "analyze this codebase" — produces CODEBASE_ANALYSIS.md
  • Verify clickable citations in [file:line](./file#Lline) format
  • Verify Mermaid flow diagrams present (architecture + sequence diagrams)
  • Verify draw.io architecture diagram generated with AWS4 icons
  • Verify all required sections present
  • mise run lint:manifests — all 5 schemas valid
  • mise run lint:cross-refs — 0 errors, 0 warnings
  • gitleaks — no leaks found
  • bandit — 0 findings
  • semgrep — 0 findings (with repo exclusions)
  • checkov — clean
  • dprint check — clean

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

Add a documentation plugin that analyzes codebases to produce a single
CODEBASE_ANALYSIS.md with source-of-truth citations. Designed for legacy
and AI-generated codebases where engineers need deep understanding to
operate, debug, and extend the system.

Key capabilities:
- Outline-driven pipeline: file tree → outline → iterative analysis → assembly
- Clickable citations: every finding links to source code via markdown links
- Discrepancy detection: cross-references README/metadata vs actual code
- Actionable failure modes: detection methods + recovery commands for oncall
- Architecture diagrams: delegates to aws-architecture-diagram skill
  (deploy-on-aws plugin) for draw.io output; Mermaid fallback for flow
  diagrams and architecture overview when skill unavailable
- Deep analysis: iterative deepening (scan → question → search → write)
- Tool-agnostic: works on Claude Code, Cursor, Codex, and other tools
- Large codebase support: tracked sequential analysis with resumable
  progress file; optional parallel workers when environment supports them

Output sections: Architecture Overview, Code Analysis, Request Lifecycle,
Domain Logic Deep-Dive, Startup & Initialization, Components, API
Contracts, Data Models, Deployment, Configuration, Monitoring &
Observability, Security, Local Development, Discrepancies, Failure Modes,
Timeout/Dependency Chain, Runbook Hints, Business Context.

Plugin structure:
- One skill: document-service (auto-triggers on documentation requests)
- Two MCP servers: awsknowledge (HTTP) and awsiac (stdio/uvx)
- 8 reference files for progressive disclosure
- Codex and Claude Code marketplace support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant