The KAI system is designed to support distributed garbage collection across multiple networked nodes, enabling transparent remote procedure calls with the syntax:
Future<T> result = proxy->method(a, b, c);
This document describes the distributed GC architecture, its goals, current status, and implementation strategy.
- Network Transparency: Remote objects should behave like local objects
- Automatic Memory Management: Distributed GC should collect unreachable objects across nodes
- Low Latency: Async Future pattern for non-blocking remote calls
- Fault Tolerance: Handle node failures gracefully
- Scalability: Support tens of thousands of connected nodes
Registry-Based GC (Include/KAI/Core/Registry.h)
- Tri-color mark-and-sweep algorithm
- Color sets: White (condemned), Grey (uncertain), Black (reachable)
- Handle-based references (not raw pointers)
- Local object lifecycle management
Key Operations:
void MarkAndSweep(Object root); // Local GC pass
void TriColor(); // Tri-color marking
void ReleaseWhite(); // Collect white objectsNetwork Handles (Include/KAI/Network/NetHandle.h)
- Unique identifiers for remote objects
- Format:
{nodeId, localHandle} - Cross-node reference tracking
Future Pattern (Include/KAI/Network/Future.h)
template <class T>
struct Future {
int Id; // Unique operation ID
ResponseType Response; // Success/Failure/Timeout
bool Complete; // Completion flag
std::optional<T> Value; // Result value
std::string ErrorMessage; // Error details
};Proxy/Agent Model:
- Proxy: Client-side stub that forwards calls to remote nodes
- Agent: Server-side handler that executes calls locally
- Generated from Tau IDL definitions
Lease-Based Management:
- Each remote reference has a lease with expiration time
- Client must periodically renew leases to keep objects alive
- Objects with expired leases become eligible for collection
Implementation:
interface ILeaseManager {
Future<int> AcquireLease(object remoteObject, int duration);
Future<void> RenewLease(int leaseId, int additionalDuration);
Future<void> ReleaseLease(int leaseId);
}
Advantages:
- Fault tolerance: Node crashes automatically expire leases
- Simple protocol: No need for complex distributed consensus
- Predictable behavior: Clear lifecycle boundaries
Disadvantages:
- Network overhead: Periodic lease renewals
- Latency-sensitive: Must renew before expiration
Cross-Node Reachability:
- Extend local tri-color marking across network boundaries
- Track inter-node references explicitly
- Coordinate GC cycles across nodes
Reference Tracking Interface:
interface IReferenceTracker {
Future<void> TrackReference(int sourceNode, int targetNode, int objectId);
Future<void> UntrackReference(int sourceNode, int targetNode, int objectId);
Future<int[]> GetReferencingNodes(int objectId);
}
Distributed Mark Phase:
- Start mark phase on root node
- For each remote reference, send mark message to target node
- Target node marks object and continues tracing
- Aggregate reachability information
- Sweep unreachable objects on all nodes
Advantages:
- Complete garbage detection (no leaks)
- No false collection (safe)
- Unified with local GC algorithm
Disadvantages:
- Complex coordination
- Requires distributed consensus
- Vulnerable to node failures during GC
Combine Leases + Distributed Tracing:
- Use leases for fault tolerance and initial approximation
- Use distributed tracing for complete collection
- Perform distributed trace periodically or on memory pressure
Protocol:
-
Normal Operation: Lease-based management
- Proxies acquire leases on remote objects
- Automatic renewal or explicit release
- Fast, low-overhead
-
Full Collection: Triggered by memory pressure
- Coordinate distributed mark-and-sweep
- Collect circular garbage across nodes
- Update lease states based on reachability
-
Node Failure: Lease expiration handles crashes
- Objects with expired leases marked for collection
- Remaining nodes continue operation
- No distributed consensus required for failures
- Local GC: Tri-color mark-and-sweep in Registry (100% tests passing)
- Network Layer: Node, Proxy, Agent infrastructure (Include/KAI/Network/)
- Future: Basic Future template with success/failure tracking
- Tau IDL: Interface definition language for proxy/agent generation
- Code Generation: Automatic proxy/agent generation from Tau definitions
- Distributed Reference Tracking: Cross-node reference management
- Lease Management: Automatic lease acquisition and renewal
- Proxy Lifecycle: Proper cleanup when proxies go out of scope
- Cycle Detection: Cross-node circular garbage detection
- Coordinated GC: Distributed mark-and-sweep coordination
- Object Migration: Move objects between nodes for load balancing
- Weak References: Network-transparent weak references
- Finalization: Distributed finalization queue
Goal: Enable Future<T> result = proxy->method(a,b,c) syntax
Tasks:
- Define Future template
- Implement Proxy/Agent base classes
- Generate proxy code from Tau IDL
- Implement basic remote method invocation
- Handle serialization/deserialization of arguments and results
Tests: TauFutureProxyTests.cpp (50 tests)
Goal: Remote objects don't leak when proxies are destroyed
Tasks:
- Implement ILeaseManager interface
- Automatic lease acquisition in proxy constructors
- Automatic lease release in proxy destructors
- Lease renewal background task
- Handle lease expiration on agent side
Estimated Tests: 30 tests for lease management
Goal: Detect when remote objects are truly unreachable
Tasks:
- Implement IReferenceTracker interface
- Track outgoing references (local → remote)
- Track incoming references (remote → local)
- Store reference graph persistently
- Query reachability across nodes
Estimated Tests: 40 tests for reference tracking
Goal: Collect circular garbage across nodes
Tasks:
- Implement ICycleDetector interface
- Coordinate mark phase across nodes
- Aggregate reachability information
- Perform distributed sweep
- Handle node failures during GC
Estimated Tests: 50 tests for distributed GC
Goal: Production-ready distributed GC
Tasks:
- Object migration for load balancing
- Weak references across network
- Distributed finalization
- Performance optimization
- Monitoring and diagnostics
Estimated Tests: 60+ tests for advanced features
// Connect to remote node
node = createNetworkNode()
node.connect("192.168.1.10", 14589)
// Create proxy to remote calculator
proxy = node.CreateProxy<ICalculator>()
// Call remote method (non-blocking)
future = proxy->Add(5, 3)
// Wait for result
if future.Succeeded()
print(future.GetValue()) // 8
else
print("Error: " + future.ErrorMessage)
namespace Math {
interface ICalculator {
Future<int> Add(int a, int b);
Future<int> Multiply(int a, int b);
Future<float> Divide(float a, float b);
}
}
class CalculatorProxy : public ICalculator {
NetHandle remoteHandle_;
Node* node_;
LeaseManager leases_;
public:
CalculatorProxy(Node* node, NetHandle handle)
: node_(node), remoteHandle_(handle) {
// Acquire lease on remote object
leases_.AcquireLease(handle, 60000); // 60 seconds
}
~CalculatorProxy() {
// Release lease
leases_.ReleaseLease(remoteHandle_);
}
Future<int> Add(int a, int b) override {
// Serialize call
BinaryStream stream;
stream << "Add" << a << b;
// Send to remote node
return node_->SendRequest<int>(remoteHandle_, stream);
}
};class CalculatorAgent : public ICalculator {
std::unique_ptr<Calculator> impl_;
public:
CalculatorAgent() : impl_(std::make_unique<Calculator>()) {}
Future<int> Add(int a, int b) override {
// Execute locally
int result = impl_->Add(a, b);
// Return completed future
Future<int> future;
future.Value = result;
future.Complete = true;
future.Response = ResponseType::Returned;
return future;
}
};- Local GC correctness (Registry tests): ✅ 147/147 passing
- Proxy/Agent generation (Tau tests): ✅ 100/109 passing
- Future patterns: 🚧 TauFutureProxyTests.cpp (50 tests)
- Cross-node references
- Lease management
- Cycle detection
- Node failure scenarios
- Throughput: RPC calls/second
- Latency: Round-trip time for remote calls
- Scalability: Performance with N nodes (10, 100, 1000, 10000)
- Memory: Leak detection under sustained load
- Long-running distributed computations
- Random node failures and recoveries
- Heavy GC pressure
- Network partitions
// Enable distributed GC tracing
node.SetGCTraceLevel(3); // Verbose GC logging
// Trace remote calls
node.SetRPCTraceLevel(2); // Log all RPC callsinterface IGCDiagnostics {
Future<MemoryStats> GetMemoryStats(int nodeId);
Future<int> GetObjectCount(int nodeId);
Future<int[]> GetLiveLeases();
Future<ReferenceGraph> GetReferenceGraph();
}
- Lease acquisition/release rate
- Lease renewal success rate
- GC pause time (local and distributed)
- Cross-node reference count
- Remote object count per node
- RPC latency percentiles (p50, p95, p99)
-
GC Cycles: Registry.cpp:201 HACK to avoid local cycles
- Affects: Local GC only
- Workaround: Manual cycle breaking
- Fix planned: Proper cycle detection algorithm
-
ENet Stub: Network layer uses stub implementation
- Affects: Real network testing
- Workaround: In-memory testing only
- Fix planned: Full ENet integration or alternative
-
Clock Synchronization: Lease management assumes synchronized clocks
- Impact: Incorrect expiration with clock skew
- Mitigation: Use relative time offsets, NTP
-
Network Partitions: Current design doesn't handle splits well
- Impact: Inconsistent state after partition heal
- Mitigation: Partition-aware protocols, conflict resolution
-
Memory Overhead: Each proxy carries lease state
- Impact: Higher memory usage with many proxies
- Mitigation: Proxy pooling, batch lease management
- Networking.md: Overview of network system
- NetworkArchitecture.md: Detailed network architecture
- PeerToPeerNetworking.md: P2P implementation
- PROJECT_ANALYSIS.md: Complete project analysis
- "Distributed Garbage Collection" by Paul R. Wilson (1992)
- "Network Objects" by Birrell et al. (1993)
- "Java RMI and Distributed Garbage Collection" by Sun Microsystems
- Java RMI: Lease-based distributed GC
- .NET Remoting: Lease-based with sponsors
- Erlang: Per-process GC, message passing
- Orleans: Virtual actors with automatic lifecycle
KAI's distributed garbage collection system aims to provide network-transparent object management with the simple syntax Future<T> result = proxy->method(a,b,c). The hybrid lease + tracing approach balances fault tolerance, correctness, and performance.
Current focus (Phase 1-2) is on establishing the basic infrastructure and lease-based lifetime management. Future phases will add complete distributed tracing and advanced features for production use.
Status: Phase 1 in progress, with 50 new tests validating the Future proxy pattern and 50 new Rho iteration tests ensuring refactoring safety.