Auto-detect document links from markdown, wikilinks, and HTML

## Summary

libscope has a `document_links` table and manual `link-documents` MCP tool, but no automatic detection of links from document content. The link graph stays empty unless users manually curate it, and the knowledge graph visualization ignores `document_links` entirely (using only embedding similarity edges).

## Current State

- `src/core/link-extractor.ts` — `extractLinks(html, baseUrl): string[]` exists for HTML `<a>` tags, but only used by the web spider for crawling, not for link graph population
- `src/connectors/obsidian.ts` — already parses `[[wikilinks]]` into a `wikilinks: string[]` array, converts them to markdown format in body text, but **never calls `createLink()`**
- `src/core/links.ts` — `createLink(db, sourceId, targetId, linkType, label?)` is ready to use
- `src/core/graph.ts` — `GraphEdge.type` only supports `"belongs_to_topic" | "has_tag" | "similar_to"`; `document_links` rows are never read for graph edges

## Detailed Implementation Plan

### Step 1 — Extend `link-extractor.ts` to handle markdown and wikilinks

Add two new extraction functions alongside the existing `extractLinks`:

```typescript
// Extract [text](url) markdown links — return absolute URLs or relative paths
export function extractMarkdownLinks(content: string): string[]

// Extract [[PageName]] and [[PageName|display]] wikilinks — return raw page names
export function extractWikilinks(content: string): string[]
```

The wikilink regex already exists in `obsidian.ts` (line 150): `/(?<!!)\[\[([^\]|]+)(?:\|([^\]]*))?\]\]/g` — move it here as the canonical implementation.

For markdown links, use a simple `[text](href)` regex, filtering out image links (`![...](...)`), mailto:, and anchor-only (`#...`) hrefs.

---

### Step 2 — Add link resolution to `src/core/links.ts`

Add a new function that resolves a URL or title string to a known document ID:

```typescript
export function resolveDocumentByUrl(db: Database.Database, url: string): string | null
export function resolveDocumentByTitle(db: Database.Database, title: string): string | null
```

Both do a simple DB lookup:
- `resolveDocumentByUrl`: exact match on `documents.url` (normalize trailing slash/fragment first)
- `resolveDocumentByTitle`: case-insensitive match on `documents.title`; for wikilinks, also try slug matching (`title.toLowerCase().replace(/\s+/g, '-')`)

Add a combined helper:
```typescript
export function resolveDocumentLink(
  db: Database.Database,
  linkTarget: string, // URL or wikilink name
  sourceUrl?: string,  // for relative URL resolution
): string | null
```

---

### Step 3 — Wire link extraction into `indexDocument` in `src/core/indexing.ts`

After the transaction completes at line 470, add a post-index link extraction pass:

```typescript
// After transaction() call, before return:
await extractAndStoreDocumentLinks(db, docId, input.content, input.url);
```

New function `extractAndStoreDocumentLinks`:

```typescript
async function extractAndStoreDocumentLinks(
  db: Database.Database,
  docId: string,
  content: string,
  sourceUrl?: string,
): Promise<void> {
  const targets = new Set<string>();

  // Detect content type and extract accordingly
  if (content.trimStart().startsWith('<')) {
    // HTML content
    for (const url of extractLinks(content, sourceUrl ?? '')) targets.add(url);
  } else {
    // Markdown/plain — extract both markdown links and wikilinks
    for (const url of extractMarkdownLinks(content)) targets.add(url);
    for (const name of extractWikilinks(content)) targets.add(name);
  }

  for (const target of targets) {
    const targetId = resolveDocumentLink(db, target, sourceUrl);
    if (targetId && targetId !== docId) {
      try {
        createLink(db, docId, targetId, 'references');
      } catch {
        // UNIQUE constraint violation = link already exists, skip
      }
    }
  }
}
```

This runs only when the content contains resolvable links — silently skips unresolvable ones.

---

### Step 4 — Add `"references"` link type

In `src/core/links.ts`, add `"references"` to `VALID_LINK_TYPES` and the `LinkType` union. This distinguishes auto-detected content links from manually curated semantic relationships (`see_also`, `prerequisite`, etc.).

In `src/db/schema.ts`, add a migration (next version after current) with a CHECK constraint update if desired, though SQLite doesn't enforce TEXT enums — the app-level set is sufficient.

---

### Step 5 — Wire Obsidian connector wikilinks

In `src/connectors/obsidian.ts`, after `indexDocument()` returns at line 340, call the new link extraction function using the already-parsed `parsed.wikilinks` array:

```typescript
// After indexDocument call:
for (const wikilink of parsed.wikilinks) {
  const targetId = resolveDocumentByTitle(db, wikilink);
  if (targetId && targetId !== indexed.id) {
    try { createLink(db, indexed.id, targetId, 'references'); } catch {}
  }
}
```

Note: wikilinks in Obsidian refer to other vault files by title/filename, so `resolveDocumentByTitle` is the right resolver here. Since vault sync processes all files, links created early may be unresolvable until the target file is indexed — a second pass or re-index after full sync would resolve these. Consider adding a post-sync link resolution sweep.

---

### Step 6 — Add `document_links` edges to the knowledge graph

In `src/core/graph.ts`, after building `similar_to` edges (line ~263), add:

```typescript
// Add explicit document_links edges
const allLinks = listLinks(db); // existing function in links.ts
for (const link of allLinks) {
  if (nodeIds.has(link.sourceId) && nodeIds.has(link.targetId)) {
    edges.push({
      source: link.sourceId,
      target: link.targetId,
      type: link.linkType as GraphEdge['type'], // extend union
      weight: 1,
    });
  }
}
```

Extend `GraphEdge.type` to include `"see_also" | "prerequisite" | "supersedes" | "related" | "references"`.

---

### Step 7 — Tests

- Unit test `extractMarkdownLinks` and `extractWikilinks` in `tests/unit/`
- Unit test `resolveDocumentByUrl` and `resolveDocumentByTitle`
- Integration test: index two documents where doc A links to doc B's URL → verify `document_links` row created with `link_type = "references"`
- Integration test: Obsidian sync with two files where file A wikilinks to file B → verify link created
- Integration test: graph includes `document_links` edges

---

## Files to Modify

| File | Change |
|------|--------|
| `src/core/link-extractor.ts` | Add `extractMarkdownLinks()`, `extractWikilinks()` |
| `src/core/links.ts` | Add `"references"` type, `resolveDocumentByUrl()`, `resolveDocumentByTitle()`, `resolveDocumentLink()` |
| `src/core/indexing.ts` | Call `extractAndStoreDocumentLinks()` after transaction |
| `src/connectors/obsidian.ts` | Wire `parsed.wikilinks` to `createLink()` after indexing |
| `src/core/graph.ts` | Add `document_links` edges; extend `GraphEdge.type` union |
| `src/db/schema.ts` | No schema change needed (link_type is TEXT, enforced at app level) |
| `tests/unit/` | Tests for new extractor and resolver functions |
| `tests/integration/` | End-to-end link detection tests |

File	Change
`src/core/link-extractor.ts`	Add `extractMarkdownLinks()`, `extractWikilinks()`
`src/core/links.ts`	Add `"references"` type, `resolveDocumentByUrl()`, `resolveDocumentByTitle()`, `resolveDocumentLink()`
`src/core/indexing.ts`	Call `extractAndStoreDocumentLinks()` after transaction
`src/connectors/obsidian.ts`	Wire `parsed.wikilinks` to `createLink()` after indexing
`src/core/graph.ts`	Add `document_links` edges; extend `GraphEdge.type` union
`src/db/schema.ts`	No schema change needed (link_type is TEXT, enforced at app level)
`tests/unit/`	Tests for new extractor and resolver functions
`tests/integration/`	End-to-end link detection tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-detect document links from markdown, wikilinks, and HTML #394

Summary

Current State

Detailed Implementation Plan

Step 1 — Extend `link-extractor.ts` to handle markdown and wikilinks

Step 2 — Add link resolution to `src/core/links.ts`

Step 3 — Wire link extraction into `indexDocument` in `src/core/indexing.ts`

Step 4 — Add `"references"` link type

Step 5 — Wire Obsidian connector wikilinks

Step 6 — Add `document_links` edges to the knowledge graph

Step 7 — Tests

Files to Modify

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Auto-detect document links from markdown, wikilinks, and HTML #394

Description

Summary

Current State

Detailed Implementation Plan

Step 1 — Extend link-extractor.ts to handle markdown and wikilinks

Step 2 — Add link resolution to src/core/links.ts

Step 3 — Wire link extraction into indexDocument in src/core/indexing.ts

Step 4 — Add "references" link type

Step 5 — Wire Obsidian connector wikilinks

Step 6 — Add document_links edges to the knowledge graph

Step 7 — Tests

Files to Modify

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Step 1 — Extend `link-extractor.ts` to handle markdown and wikilinks

Step 2 — Add link resolution to `src/core/links.ts`

Step 3 — Wire link extraction into `indexDocument` in `src/core/indexing.ts`

Step 4 — Add `"references"` link type

Step 6 — Add `document_links` edges to the knowledge graph