Skip to content

feat: bulk operations for document management #168

@RobertLD

Description

@RobertLD

Overview

Add bulk operations for managing multiple documents at once — bulk tag, bulk move to topic, bulk delete, bulk re-embed.

Why

Managing a knowledge base with 10,000+ documents one at a time is painful. Teams need to reclassify, retag, or clean up documents in batches.

Implementation Plan

1. Core Module (src/core/bulk.ts)

export interface BulkSelector {
  documentIds?: string[];          // explicit list
  topic?: string;                  // all docs in topic
  tags?: string[];                 // all docs with these tags
  olderThan?: string;              // ISO date filter
  query?: string;                  // match by search query
}

export function bulkAddTags(db, selector: BulkSelector, tags: string[]): BulkResult
export function bulkRemoveTags(db, selector: BulkSelector, tags: string[]): BulkResult
export function bulkMoveTopic(db, selector: BulkSelector, topicId: string): BulkResult
export function bulkDelete(db, selector: BulkSelector): BulkResult
export function bulkReembed(db, provider, selector: BulkSelector): Promise<BulkResult>

export interface BulkResult {
  affected: number;
  errors: Array<{ documentId: string; error: string }>;
}

2. CLI Commands

  • libscope bulk tag --topic engineering --add api,internal — tag all docs in topic
  • libscope bulk move --tags stale --to archive — move tagged docs to topic
  • libscope bulk delete --older-than 2024-01-01 --confirm — delete old docs
  • libscope bulk reembed --topic python — re-embed specific topic

All destructive operations require --confirm flag or interactive confirmation.

3. REST API Endpoints

  • POST /api/v1/bulk/tag — body: { selector, addTags, removeTags }
  • POST /api/v1/bulk/move — body: { selector, topicId }
  • POST /api/v1/bulk/delete — body: { selector }
  • POST /api/v1/bulk/reembed — body: { selector }

4. Safety Features

  • Dry-run mode: --dry-run shows what would be affected without changes
  • Confirmation prompt for destructive ops (CLI)
  • Max batch size (1000) to prevent accidental mass operations
  • Undo via versioning (documents have version history)

5. Tests

  • Selector resolution (by topic, tags, date, query)
  • Each bulk operation type
  • Dry-run mode
  • Error handling (partial failures)
  • Safety limits
  • 75%+ coverage

Complexity: Small-Medium (~350 lines)

Files touched: new core/bulk.ts, cli/index.ts, api/routes.ts, core/index.ts

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions