-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
libscope has a document_links table and manual link-documents MCP tool, but no automatic detection of links from document content. The link graph stays empty unless users manually curate it, and the knowledge graph visualization ignores document_links entirely (using only embedding similarity edges).
Current State
src/core/link-extractor.ts—extractLinks(html, baseUrl): string[]exists for HTML<a>tags, but only used by the web spider for crawling, not for link graph populationsrc/connectors/obsidian.ts— already parses[[wikilinks]]into awikilinks: string[]array, converts them to markdown format in body text, but never callscreateLink()src/core/links.ts—createLink(db, sourceId, targetId, linkType, label?)is ready to usesrc/core/graph.ts—GraphEdge.typeonly supports"belongs_to_topic" | "has_tag" | "similar_to";document_linksrows are never read for graph edges
Detailed Implementation Plan
Step 1 — Extend link-extractor.ts to handle markdown and wikilinks
Add two new extraction functions alongside the existing extractLinks:
// Extract [text](url) markdown links — return absolute URLs or relative paths
export function extractMarkdownLinks(content: string): string[]
// Extract [[PageName]] and [[PageName|display]] wikilinks — return raw page names
export function extractWikilinks(content: string): string[]The wikilink regex already exists in obsidian.ts (line 150): /(?<!!)\[\[([^\]|]+)(?:\|([^\]]*))?\]\]/g — move it here as the canonical implementation.
For markdown links, use a simple [text](href) regex, filtering out image links (), mailto:, and anchor-only (#...) hrefs.
Step 2 — Add link resolution to src/core/links.ts
Add a new function that resolves a URL or title string to a known document ID:
export function resolveDocumentByUrl(db: Database.Database, url: string): string | null
export function resolveDocumentByTitle(db: Database.Database, title: string): string | nullBoth do a simple DB lookup:
resolveDocumentByUrl: exact match ondocuments.url(normalize trailing slash/fragment first)resolveDocumentByTitle: case-insensitive match ondocuments.title; for wikilinks, also try slug matching (title.toLowerCase().replace(/\s+/g, '-'))
Add a combined helper:
export function resolveDocumentLink(
db: Database.Database,
linkTarget: string, // URL or wikilink name
sourceUrl?: string, // for relative URL resolution
): string | nullStep 3 — Wire link extraction into indexDocument in src/core/indexing.ts
After the transaction completes at line 470, add a post-index link extraction pass:
// After transaction() call, before return:
await extractAndStoreDocumentLinks(db, docId, input.content, input.url);New function extractAndStoreDocumentLinks:
async function extractAndStoreDocumentLinks(
db: Database.Database,
docId: string,
content: string,
sourceUrl?: string,
): Promise<void> {
const targets = new Set<string>();
// Detect content type and extract accordingly
if (content.trimStart().startsWith('<')) {
// HTML content
for (const url of extractLinks(content, sourceUrl ?? '')) targets.add(url);
} else {
// Markdown/plain — extract both markdown links and wikilinks
for (const url of extractMarkdownLinks(content)) targets.add(url);
for (const name of extractWikilinks(content)) targets.add(name);
}
for (const target of targets) {
const targetId = resolveDocumentLink(db, target, sourceUrl);
if (targetId && targetId !== docId) {
try {
createLink(db, docId, targetId, 'references');
} catch {
// UNIQUE constraint violation = link already exists, skip
}
}
}
}This runs only when the content contains resolvable links — silently skips unresolvable ones.
Step 4 — Add "references" link type
In src/core/links.ts, add "references" to VALID_LINK_TYPES and the LinkType union. This distinguishes auto-detected content links from manually curated semantic relationships (see_also, prerequisite, etc.).
In src/db/schema.ts, add a migration (next version after current) with a CHECK constraint update if desired, though SQLite doesn't enforce TEXT enums — the app-level set is sufficient.
Step 5 — Wire Obsidian connector wikilinks
In src/connectors/obsidian.ts, after indexDocument() returns at line 340, call the new link extraction function using the already-parsed parsed.wikilinks array:
// After indexDocument call:
for (const wikilink of parsed.wikilinks) {
const targetId = resolveDocumentByTitle(db, wikilink);
if (targetId && targetId !== indexed.id) {
try { createLink(db, indexed.id, targetId, 'references'); } catch {}
}
}Note: wikilinks in Obsidian refer to other vault files by title/filename, so resolveDocumentByTitle is the right resolver here. Since vault sync processes all files, links created early may be unresolvable until the target file is indexed — a second pass or re-index after full sync would resolve these. Consider adding a post-sync link resolution sweep.
Step 6 — Add document_links edges to the knowledge graph
In src/core/graph.ts, after building similar_to edges (line ~263), add:
// Add explicit document_links edges
const allLinks = listLinks(db); // existing function in links.ts
for (const link of allLinks) {
if (nodeIds.has(link.sourceId) && nodeIds.has(link.targetId)) {
edges.push({
source: link.sourceId,
target: link.targetId,
type: link.linkType as GraphEdge['type'], // extend union
weight: 1,
});
}
}Extend GraphEdge.type to include "see_also" | "prerequisite" | "supersedes" | "related" | "references".
Step 7 — Tests
- Unit test
extractMarkdownLinksandextractWikilinksintests/unit/ - Unit test
resolveDocumentByUrlandresolveDocumentByTitle - Integration test: index two documents where doc A links to doc B's URL → verify
document_linksrow created withlink_type = "references" - Integration test: Obsidian sync with two files where file A wikilinks to file B → verify link created
- Integration test: graph includes
document_linksedges
Files to Modify
| File | Change |
|---|---|
src/core/link-extractor.ts |
Add extractMarkdownLinks(), extractWikilinks() |
src/core/links.ts |
Add "references" type, resolveDocumentByUrl(), resolveDocumentByTitle(), resolveDocumentLink() |
src/core/indexing.ts |
Call extractAndStoreDocumentLinks() after transaction |
src/connectors/obsidian.ts |
Wire parsed.wikilinks to createLink() after indexing |
src/core/graph.ts |
Add document_links edges; extend GraphEdge.type union |
src/db/schema.ts |
No schema change needed (link_type is TEXT, enforced at app level) |
tests/unit/ |
Tests for new extractor and resolver functions |
tests/integration/ |
End-to-end link detection tests |