Skip to content

feat: output language support for wiki generation#99

Merged
RaghavChamadiya merged 7 commits intomainfrom
feature/add-language-support
Apr 26, 2026
Merged

feat: output language support for wiki generation#99
RaghavChamadiya merged 7 commits intomainfrom
feature/add-language-support

Conversation

@RaghavChamadiya
Copy link
Copy Markdown
Collaborator

Supersedes #75 by @maxfrank76. Picking up the long-running output-language work, rebased onto current main and polished so we can land it.

What this adds

  • language field on GenerationConfig (default en).
  • --language is read from .repowise/config.yaml and threaded through init and update to PageGenerator.
  • PageGenerator prepends a small system-prompt instruction in the configured language while keeping code, paths, and symbol names untranslated.
  • 15-language code-to-name table (en, ru, es, fr, de, zh, ja, ko, it, pt, nl, pl, tr, ar, hi). Unknown codes log a warning and fall back to English.
  • Language code is included in the generation cache key so different output languages do not collide.
  • Input is sanitized (alphanumeric + underscore) so a config-supplied language string cannot inject newlines or extra instructions into the system prompt.

Polish on top of #75

  • Restored package-lock.json next pin to match main (~15.5.15).
  • Re-grouped _LANGUAGE_NAMES with module-level constants and restored the missing blank line before the per-type generation methods section.
  • Extracted the language-instruction construction into a small _build_system_prompt helper so _call_provider stays focused on caching plus dispatch.
  • Switched the unknown-language warning from an f-string to structlog keyword args, and stopped logging the fallback when the configured language is already en.
  • Made run_generation's new generation_config kwarg optional so run_pipeline and the workspace update path keep working when a config is not threaded through.
  • Removed a broken generation_config=config call inside run_pipeline that referenced an undefined local.
  • Hoisted dataclasses.replace to the module-level imports.

Tests

Added five unit tests in tests/unit/generation/test_page_generator.py:

  • English passthrough leaves the prompt untouched.
  • Non-English prepends the named-language instruction.
  • Unknown language code falls back silently to English.
  • Newline-injection in the language string is stripped and the prompt is not poisoned.
  • Cache keys differ across output languages.

Credit to @maxfrank76 for the design and the bulk of the implementation.

maxfrank76 and others added 7 commits April 26, 2026 15:48
- Add language field to GenerationConfig
- Load language from config.yaml in init_cmd.py and update_cmd.py
- Pass language through orchestrator to PageGenerator
- Inject language instruction into system prompt in _call_provider
- Include language in cache key
- Set num_ctx in Ollama provider for larger context window
- restore package-lock.json next pin to ~15.5.15 to match main
- move _LANGUAGE_NAMES below the structlog logger import and group
  with other module-level constants
- restore the blank line separator before the Per-type generation
  methods section in PageGenerator
- extract the language-instruction construction into a small
  _build_system_prompt helper so _call_provider stays focused on
  caching plus dispatch
- switch the unknown-language warning from an f-string to structlog
  keyword args, matching the rest of the file, and stop logging the
  fallback when the configured language is already 'en'
- make run_generation's generation_config kwarg optional so callers
  that do not thread one through (run_pipeline, workspace update)
  fall back to GenerationConfig() defaults instead of TypeError
- drop the broken generation_config=config call inside run_pipeline
  where no such local existed
- hoist the dataclasses.replace import to the module top
- add five unit tests covering the english passthrough, non-english
  prepend, unknown-code fallback, control-char sanitization, and
  cache-key variance by language
Copy link
Copy Markdown
Collaborator

@swati510 swati510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@RaghavChamadiya RaghavChamadiya merged commit 69a1cf0 into main Apr 26, 2026
5 checks passed
@RaghavChamadiya RaghavChamadiya deleted the feature/add-language-support branch April 26, 2026 10:29
This was referenced Apr 26, 2026
RaghavChamadiya added a commit that referenced this pull request Apr 26, 2026
* feat: improve PreToolUse hook relevance with multi-signal search

Replace FTS-only file retrieval with a 3-signal ranking system:
- Symbol name match (weight 2.0) — most precise
- File path match (weight 1.5) — catches path-based searches
- FTS on wiki content (weight 1.0) — broadest, lowest priority
Files ranked by signal score then PageRank, top 3 returned.

Remove git signals (HOTSPOT, bus-factor, owner) from enrichment —
that info belongs in get_risk, not every search. Remove Bash command
interception (fragile regex on grep/rg commands).

Keep: symbols (3), importers (3), dependencies (2) per file.

* release: v0.3.1

Bumps repowise to 0.3.1 across pyproject.toml and the three sub-package
__init__.py files.

Highlights since 0.3.0:

- Output language support for generated wiki content (#99)
- Luau / Roblox language support (#89)
- OpenRouter LLM and embedding provider (#56)
- base_url plus per-provider env vars for OpenAI / Anthropic / Gemini /
  Ollama / LiteLLM (#85)
- SQLite WAL plus busy_timeout plus FK constraints, fixing concurrent
  'repowise update' database is locked errors (#101)
- CLAUDE.md opt-out prompt now asked in both full and advanced modes
  and the answer is honoured (#102)
- repowise init no longer silently overwrites unparseable user JSON
  configs (#94)
- pyproject packages list resynced with the language-support refactor
  so editable installs and CI build cleanly (#97)
- uv workflow documented and dev deps migrated to PEP 735
  dependency-groups, silencing the tool.uv.dev-dependencies deprecation
  warning (#100)
- Five Dependabot security bumps (dompurify, gitpython, mako, litellm,
  python-multipart)

Also flips the project URLs and serve_cmd's _GITHUB_REPO constant from
RaghavChamadiya/repowise to repowise-dev/repowise so 'repowise serve'
can locate the published web UI tarball.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants