Skip to content

feat: auto-detect document links from markdown and wikilinks#396

Open
RobertLD wants to merge 1 commit intomainfrom
feat/auto-detect-document-links
Open

feat: auto-detect document links from markdown and wikilinks#396
RobertLD wants to merge 1 commit intomainfrom
feat/auto-detect-document-links

Conversation

@RobertLD
Copy link
Owner

Summary

Closes #394

Implements automatic link detection from document content to populate the knowledge graph with explicit document relationships.

  • Link extraction: Added extractMarkdownLinks() ([text](url)) and extractWikilinks() ([[PageName]] / [[PageName|alias]]) to link-extractor.ts
  • Link resolution: Added resolveDocumentByUrl(), resolveDocumentByTitle(), and resolveDocumentLink() to links.ts for matching extracted refs to existing documents
  • New link type: Added "references" to LinkType to distinguish auto-detected links from manually curated ones
  • Automatic storage: extractAndStoreDocumentLinks() is called after indexDocument() completes (post-transaction, non-blocking)
  • Obsidian connector: syncObsidianVault() now resolves parsed wikilinks via resolveDocumentByTitle() and stores them as "references" links
  • Knowledge graph: GraphEdge.type extended with see_also | prerequisite | supersedes | related | references; buildKnowledgeGraph() now queries document_links and includes them as edges

Test plan

  • Unit tests for extractMarkdownLinks() and extractWikilinks() — standard links, aliases, dedup, empty input
  • Unit/integration tests for resolveDocumentByUrl/Title/Link() and extractAndStoreDocumentLinks() — URL lookup, case-insensitive title lookup, fallback resolution, self-link skipping, unresolvable skipping
  • Integration test for buildKnowledgeGraph() — verifies document_links edges appear with correct type
  • All 1276 tests pass across 69 test files

🤖 Generated with Claude Code

Implements issue #394. Automatically extracts and stores document
relationships from content at index time, and surfaces them in the
knowledge graph.

- Add extractMarkdownLinks() and extractWikilinks() to link-extractor.ts
- Add "references" LinkType for auto-detected links
- Add resolveDocumentByUrl/Title/Link() and extractAndStoreDocumentLinks() to links.ts
- Call extractAndStoreDocumentLinks() after indexDocument() completes
- Wire Obsidian wikilink resolution into createLink() via syncObsidianVault()
- Extend GraphEdge.type with see_also/prerequisite/supersedes/related/references
- Include document_links edges in buildKnowledgeGraph()
- Add unit and integration tests (1276 passing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Mar 11, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
libscope Ignored Ignored Mar 11, 2026 11:19pm

const re = /\[([^\]]*)\]\(([^)]+)\)/g;
let match: RegExpExecArray | null;

while ((match = re.exec(content)) !== null) {

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on
library input
may run slow on strings starting with '[' and with many repetitions of '[\'.
This
regular expression
that depends on
library input
may run slow on strings starting with '[](' and with many repetitions of '[](('.
This
regular expression
that depends on
library input
may run slow on strings starting with '[' and with many repetitions of '[\'.
This
regular expression
that depends on
library input
may run slow on strings starting with '[](' and with many repetitions of '[](('.
const re = /\[\[([^\]|]+)(?:\|[^\]]+)?\]\]/g;
let match: RegExpExecArray | null;

while ((match = re.exec(content)) !== null) {

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on
library input
may run slow on strings starting with '[[' and with many repetitions of '[[\'.
This
regular expression
that depends on
library input
may run slow on strings starting with '[[\|' and with many repetitions of '[[\|'.
This
regular expression
that depends on
library input
may run slow on strings starting with '[[' and with many repetitions of '[[\'.
This
regular expression
that depends on
library input
may run slow on strings starting with '[[\|' and with many repetitions of '[[\|'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto-detect document links from markdown, wikilinks, and HTML

1 participant