Skip to content

docs: add embedding troubleshooting for stalled flows#327

Open
mason5052 wants to merge 1 commit into
vxcontrol:mainfrom
mason5052:codex/issue-322-embedding-troubleshooting
Open

docs: add embedding troubleshooting for stalled flows#327
mason5052 wants to merge 1 commit into
vxcontrol:mainfrom
mason5052:codex/issue-322-embedding-troubleshooting

Conversation

@mason5052
Copy link
Copy Markdown
Contributor

Summary

Adds a troubleshooting subsection to the Embedding Configuration and Testing section of the README for the case where a flow starts but then waits indefinitely with no progress. Documentation only.

Problem

In #322 a user reported that a flow "just keeps waiting, nothing moving ahead." As noted on the issue, this symptom is frequently caused by a misconfigured or unreachable embedding provider rather than by the flow itself, and the suggested first step is to check docker logs pentagi and the embedding configuration guide. The README documents how to configure and test embeddings, but it does not connect the "flow hangs / never progresses" symptom to embedding provider settings, so users debug the flow instead of the provider.

Solution

A new ### Troubleshooting: Flow Stalls or Hangs Without Progress subsection placed at the end of the Embedding Configuration and Testing section. It gives a short, ordered diagnostic path that matches the maintainer guidance on the issue:

  1. Check docker logs pentagi for embedding errors (auth, wrong model, timeout, TLS).
  2. Run etester test -verbose to validate the embedding provider and database connection without starting a flow.
  3. Verify the relevant .env settings, with cross-links to the existing Supported Embedding Providers and Why Consistent Embedding Providers Matter subsections.

Every environment variable referenced (EMBEDDING_PROVIDER, EMBEDDING_MODEL, EMBEDDING_URL, EMBEDDING_KEY, OPEN_AI_KEY, OPEN_AI_SERVER_URL, PROXY_URL, HTTP_CLIENT_TIMEOUT) already exists in both .env.example and backend/pkg/config/config.go. The LLM-provider fallback note mirrors the behavior already documented just above in the same section.

User Impact

Documentation only. No new environment variables, no provider default changes, and no code or test changes. Users who hit a stalled flow get a direct, ordered path to the likely embedding-provider cause instead of debugging the flow.

Test Plan

  • git diff --check reports no whitespace errors
  • Diff is a single file (README.md, 29 insertions); no code, schema, compose, or .env.example changes
  • Every referenced env var verified present in both .env.example and backend/pkg/config/config.go
  • No new environment variables and no provider defaults changed
  • Internal anchor links (#embedding-tester-utility-etester, #supported-embedding-providers, #why-consistent-embedding-providers-matter) resolve to existing headings

Refs #322

Add a troubleshooting subsection to the Embedding Configuration and
Testing section for the common case where a flow starts but then waits
indefinitely with no subtasks progressing. As noted on the issue, this
is frequently caused by a misconfigured or unreachable embedding
provider rather than by the flow itself.

The subsection points users to "docker logs pentagi" as the first
check, to the etester "test" command to validate the embedding provider
and database connection without starting a flow, and to the specific
.env settings to verify (EMBEDDING_PROVIDER, EMBEDDING_MODEL,
EMBEDDING_URL, EMBEDDING_KEY, the LLM-provider fallback, PROXY_URL, and
HTTP_CLIENT_TIMEOUT). Documentation only: no new environment variables,
no provider default changes, and no code or test changes.

Refs vxcontrol#322
Copilot AI review requested due to automatic review settings June 1, 2026 21:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a README troubleshooting section to help diagnose flows that stall/hang due to embedding provider misconfiguration or connectivity issues.

Changes:

  • Documented symptoms and likely root cause (embedding calls failing/hanging) for stalled flows
  • Added step-by-step diagnostics: container logs + etester validation
  • Added configuration checklist and guidance on reindex/flush when switching providers

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md
Comment thread README.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants