Skip to content

Conversation

@mikeleppane
Copy link

Summary

This PR implements a comprehensive performance improvement plan that dramatically reduces Language Server startup time and improves edit responsiveness.

  • Reuse the executor for library loading instead of spawning new processes
  • Cache resource LibraryDoc objects to .robotcode_cache/*/resource/
  • Only invalidate namespaces that actually import the changed file
  • Allow up to 4 concurrent library loads (max_workers=min(cpu_count, 4))
  • O(1) lookup for documents importing a given source via reverse dependency map
  • get_library_users() and get_variables_users() for instant change propagation
  • Persist namespace initialization state to disk
  • Skip full import resolution on warm starts

Testing Plan

  • 28 unit tests for PickleDataCache atomic writes
  • 20 unit tests for NamespaceMetaData and cached entries
  • 15 unit tests for ResourceMetaData cache keys
  • 11 integration tests for namespace caching behavior

Architecture

Architecture Overview

flowchart TB
    subgraph cache["Disk Cache (.robotcode_cache/)"]
        lib["libdoc/<br/>Library docs"]
        res["resource/<br/>Resource docs"]
        ns["namespace/<br/>Namespace state"]
    end

    subgraph perf["Performance Optimizations"]
        p1["Shared Process Pool"]
        p2["Resource Caching"]
        p3["Targeted Invalidation"]
        p4["Parallel Library Loading"]
        p5["O(1) Dependency Lookups"]
        p7["Namespace Caching"]
        p9["Atomic Writes"]
    end

    subgraph maps["Reverse Dependency Maps"]
        imp["_importers<br/>source → documents"]
        lu["_library_users<br/>lib → documents"]
        vu["_variables_users<br/>var → documents"]
    end

    p1 --> lib
    p2 --> res
    p7 --> ns
    p9 --> ns
    p3 --> imp
    p5 --> lu
    p5 --> vu
Loading

Cold vs Warm Start Flow

flowchart TB
    subgraph startup["IDE Startup"]
        open["User Opens VS Code"]
    end

    subgraph cold["Cold Start (No Cache) ~2-4 min"]
        c1["Parse .robot files"]
        c2["Resolve imports in parallel"]
        c3["Load libraries<br/>(shared executor)"]
        c4["Build namespaces"]
        c5["Save to cache<br/>(atomic write)"]
        c1 --> c2 --> c3 --> c4 --> c5
    end

    subgraph warm["Warm Start (Cache Hit) ~10-20 sec"]
        w1["Check .cache.pkl exists"]
        w2{"Validate:<br/>mtime + size?"}
        w3["Load (meta, spec)"]
        w4["Check environment identity"]
        w5["Restore namespace"]
        w1 --> w2
        w2 -->|"Match"| w3 --> w4 --> w5
        w2 -->|"Changed"| miss["Rebuild"]
    end

    subgraph runtime["Runtime Editing"]
        r1["File changed"]
        r2["O(1) lookup affected docs"]
        r3["Targeted invalidation"]
        r4["Rebuild only affected"]
        r1 --> r2 --> r3 --> r4
    end

    open --> cold
    open --> warm
    
    style cold fill:#ffcccc,stroke:#cc0000
    style warm fill:#ccffcc,stroke:#00cc00
    style runtime fill:#cce5ff,stroke:#0066cc
Loading

Cache Validation Chain

sequenceDiagram
    participant LS as Language Server
    participant Cache as Disk Cache
    participant FS as File System

    Note over LS,FS: Warm Start Validation
    
    LS->>Cache: Load .cache.pkl
    Cache-->>LS: (meta, spec) tuple
    
    LS->>FS: stat(source_file)
    FS-->>LS: mtime, size
    
    alt mtime matches
        alt size matches
            Note over LS: Skip content hash!<br/>(Fast path)
            LS->>LS: Check python_executable
            LS->>LS: Check sys_path_hash
            alt Environment matches
                LS->>LS: Restore from cache ✓
            else Environment changed
                LS->>LS: Rebuild namespace
            end
        else size differs
            LS->>FS: Read first+last 64KB
            FS-->>LS: content chunks
            LS->>LS: Compute tiered hash
            alt Hash matches
                LS->>LS: Restore from cache ✓
            else Hash differs
                LS->>LS: Rebuild namespace
            end
        end
    else mtime differs
        LS->>LS: Rebuild namespace
    end
Loading

Add persistent namespace caching to significantly improve warm start
performance. Cached namespaces are loaded from disk instead of being
rebuilt from scratch.

Key changes:
- Add NamespaceMetaData and NamespaceCacheData frozen dataclasses
  for cache serialization with validation fields (mtime, content_hash,
  python_executable, sys_path_hash)
- Add atomic cache writes using temp file + rename pattern
- Add reverse dependency tracking for efficient library/variable
  change propagation (get_library_users, get_variables_users)
- Skip content hash computation when mtime AND size match
- Add ResourceMetaData for resource caching

Tests:
- Unit tests for PickleDataCache atomic writes (28 tests)
- Unit tests for NamespaceMetaData and cached entries (20 tests)
- Unit tests for ResourceMetaData cache keys (15 tests)
- Integration tests for namespace caching behavior (11 tests)
@mikeleppane mikeleppane force-pushed the feat(perf)/performance-improvements branch from 221fb3d to ec8f008 Compare January 14, 2026 18:35
Optimize workspace-wide reference operations from O(D) to O(k) where
D = total documents and k = documents actually referencing the target.

Changes:
- Add reverse index data structures in DocumentsCacheHelper to track
  which documents reference each keyword/variable
- Use stable (source, name) tuple keys resilient to cache invalidation
- Implement diff-based updates to handle removed references after edits
- Add get_keyword_ref_users() and get_variable_ref_users() for O(1) lookup
- Update Find References to use reverse index with workspace scan fallback
- Update unused keyword/variable detection to use reverse index
@mikeleppane
Copy link
Author

I've included the second performance optimization, which can be found from this commit: 59e0314

Problem

Previously, workspace-wide reference operations (Find References, unused keyword/variable detection) required scanning all documents in the workspace for each lookup. This resulted in O(D) per-lookup complexity, where D is the total number of documents.

For unused-detection checking of K keywords, this required O(K × D) operations, causing noticeable delays in large workspaces.

Solution

Added a reverse index that maps each keyword/variable to the documents that reference it. This reduces lookup complexity from O(D) to O(k), where k is the number of documents that actually use the target (typically much smaller than D).

Architecture

Before: O(D) Workspace Scan

flowchart LR
    subgraph "Find References (Before)"
        A[Request: Find refs to 'My Keyword'] --> B[Scan ALL documents]
        B --> C[doc_001.robot]
        B --> D[doc_002.robot]
        B --> E[doc_003.robot]
        B --> F[...]
        B --> G[doc_999.robot]
        C --> H[Check namespace]
        D --> H
        E --> H
        F --> H
        G --> H
        H --> I[Return matches]
    end
Loading

After: O(k) Reverse Index Lookup

flowchart LR
    subgraph "Find References (After)"
        A[Request: Find refs to 'My Keyword'] --> B[Lookup reverse index]
        B --> C["index[('source', 'My Keyword')]"]
        C --> D[Return: doc_003, doc_047, doc_891]
        D --> E[Scan only 3 documents]
        E --> F[Return matches]
    end
Loading

Data Structures

flowchart TB
    subgraph "Reverse Index Structure"
        direction TB
        A["_keyword_ref_users<br/>dict[tuple[str, str], WeakSet[TextDocument]]"]
        B["Key: (source, name)<br/>e.g. ('common.resource', 'My Keyword')"]
        C["Value: WeakSet of documents<br/>that reference this keyword"]
        A --> B
        A --> C
    end
    
    subgraph "Forward Index (for diff-based updates)"
        direction TB
        D["_doc_keyword_refs<br/>WeakKeyDictionary[TextDocument, set]"]
        E["Key: TextDocument"]
        F["Value: set of (source, name) tuples<br/>this document references"]
        D --> E
        D --> F
    end
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant