Skip to content

Python + .NET Proposal: Auto-Compaction First-Party Support in AgentThread #2673

@lavaman131

Description

@lavaman131

Python + .NET Proposal: Auto-Compaction First-Party Support in AgentThread

Problem

Long-running conversations inevitably exceed context window limits, causing failures or degraded performance. Currently, developers must manually track token counts and implement their own compaction logic—a tedious, error-prone process that every user of the framework has to solve independently.


Context

OpenAI recently made their gpt-5.1-codex-max model publicly available using the responses endpoint with built-in auto truncation and a dedicated /responses/compact endpoint.

This proposal adds native auto-compaction capability to AgentThread. Compaction is disabled by default—users must explicitly opt-in. Once enabled, compaction happens automatically with no manual token checks required.


Design Philosophy

Core Principle: Explicit Opt-In, Then Forget

Compaction is disabled by default. Users must explicitly enable it—but once enabled, it works automatically with no manual token checks:

# Python - Explicitly enable compaction per-run
async for event in agent.run_stream(
    "Hello!",
    thread=thread,
    compaction=AutoCompactionConfig(  # Explicit opt-in
        compactor=TruncationCompactor(),
        threshold=100_000,
    )
):
    print(event.content, end="")

# No compaction parameter = no compaction (default)
async for event in agent.run_stream("Hello!", thread=thread):
    ...  # No compaction happens

# Or set agent-level default for convenience, override per-run
agent = ChatAgent(chat_client=client, compaction=default_config)
async for event in agent.run_stream("Hello!", thread=thread):  # Uses agent default
    ...
async for event in agent.run_stream("Hello!", thread=thread, compaction=None):  # Disable
    ...
// .NET - Explicitly enable compaction per-run
await foreach (var msg in agent.RunStreamingAsync(
    "Hello!",
    thread,
    new ChatClientAgentRunOptions
    {
        Compaction = new AutoCompactionConfig  // Explicit opt-in
        {
            Compactor = new TruncationCompactor(),
            Threshold = 100_000
        }
    }))
{
    Console.Write(msg.Content);
}

// No options or no Compaction property = no compaction (default)
await foreach (var msg in agent.RunStreamingAsync("Hello!", thread))
    Console.Write(msg.Content);  // No compaction happens

// Or set agent-level default for convenience
var agent = new ChatClientAgent(chatClient, new ChatClientAgentOptions
{
    Compaction = defaultConfig  // Default for all runs when opted in
});

Design Principles

  1. Explicit opt-in — Compaction is disabled by default; users must enable it
  2. Zero manual checks — Once enabled, no if thread.token_count > threshold required
  3. Run-level configuration — Configure compaction per-run for maximum flexibility
  4. Agent-level defaults — Optional convenience defaults that can be overridden per-run
  5. Actual token tracking — Uses UsageDetails from provider responses (no heuristics)
  6. Follows existing patterns — Same as ChatOptions (agent default + run override)

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                      agent.run() / run_stream()                  │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │              Compaction Check (before LLM call)          │    │
│  │  ┌─────────────────────────────────────────────────┐    │    │
│  │  │  config = run_compaction ?? agent_compaction     │    │    │
│  │  │  if config and thread.token_count > threshold:   │    │    │
│  │  │      await thread.compact(config.compactor)      │    │    │
│  │  └─────────────────────────────────────────────────┘    │    │
│  └─────────────────────────────────────────────────────────┘    │
│                              │                                   │
│                              ▼                                   │
│                      [LLM Call + Response]                       │
│                              │                                   │
│                              ▼                                   │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │              Usage Tracking (after LLM call)             │    │
│  │  thread.on_usage(response.usage_details)                 │    │
│  └─────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘

Proposed API

Python Implementation

1. Auto-Compaction Configuration

# In python/packages/core/agent_framework/_compaction.py

from pydantic import BaseModel, Field
from typing import Protocol, Sequence, runtime_checkable, Literal


class CompactionOptions(BaseModel):
    """Base compaction options. Compactors extend this with their own options."""
    
    max_tokens: int = Field(
        default=128_000,
        description="Maximum context window size for the model"
    )


class AutoCompactionConfig(BaseModel):
    """
    Configuration for automatic compaction.
    
    Pass to agent.run() or agent.run_stream() to enable automatic
    compaction when the token threshold is exceeded.
    
    Can also be set as agent-level default via ChatAgent constructor.
    """
    
    compactor: "CompactorProtocol"
    """The compactor to use for automatic compaction."""
    
    threshold: int = Field(
        default=100_000,
        description="Token count that triggers automatic compaction"
    )
    
    trigger: Literal["before_run", "after_run"] = Field(
        default="before_run",
        description="When to check and trigger compaction"
    )
    
    options: CompactionOptions | None = Field(
        default=None,
        description="Optional compactor-specific options"
    )
    
    class Config:
        arbitrary_types_allowed = True


class CompactionResult(BaseModel):
    """Result of a compaction operation."""
    original_count: int
    compacted_count: int
    original_tokens: int | None = None
    compacted_tokens: int | None = None


# Sentinel value for "use agent default"
USE_AGENT_DEFAULT = object()

2. Compactor Protocol

@runtime_checkable
class CompactorProtocol(Protocol):
    """
    Protocol for thread compaction strategies.
    
    Follows framework patterns: ChatMessageStoreProtocol, ChatClientProtocol, etc.
    """
    
    async def compact(
        self,
        messages: Sequence["ChatMessage"],
        options: CompactionOptions,
    ) -> Sequence["ChatMessage"]:
        """Compact the given messages."""
        ...

3. Token Usage Tracking on AgentThread

# In python/packages/core/agent_framework/_threads.py

from agent_framework._types import UsageDetails


class AgentThread:
    def __init__(
        self,
        *,
        service_thread_id: str | None = None,
        message_store: ChatMessageStoreProtocol | None = None,
        context_provider: AggregateContextProvider | None = None,
    ) -> None:
        # ... existing ...
        self._accumulated_usage: UsageDetails | None = None
    
    @property
    def token_count(self) -> int:
        """
        Returns the accumulated input token count for this thread.
        
        This represents the approximate context window size based on
        actual usage reported by the provider. Returns 0 if no usage
        has been tracked yet.
        """
        if self._accumulated_usage is None:
            return 0
        return self._accumulated_usage.input_token_count or 0
    
    @property
    def usage_details(self) -> UsageDetails | None:
        """Returns the accumulated usage details for this thread."""
        return self._accumulated_usage
    
    async def on_usage(self, usage: UsageDetails) -> None:
        """
        Called after each agent run to track token usage.
        
        The input_token_count from the latest response represents
        the current context window size.
        """
        if usage is None:
            return
        
        if self._accumulated_usage is None:
            self._accumulated_usage = UsageDetails(
                input_token_count=usage.input_token_count or 0,
                output_token_count=usage.output_token_count or 0,
                total_token_count=usage.total_token_count or 0,
            )
        else:
            # Latest input_token_count = current context size
            # Output tokens accumulate across the conversation
            self._accumulated_usage = UsageDetails(
                input_token_count=usage.input_token_count or 0,
                output_token_count=(
                    (self._accumulated_usage.output_token_count or 0) +
                    (usage.output_token_count or 0)
                ),
                total_token_count=usage.total_token_count or 0,
            )
    
    async def compact(
        self,
        compactor: CompactorProtocol,
        options: CompactionOptions | None = None,
    ) -> CompactionResult:
        """
        Manually compact the thread's message history.
        
        Note: With auto-compaction enabled, you typically don't need
        to call this method directly.
        
        Args:
            compactor: Any object implementing CompactorProtocol.
            options: Compaction options (uses defaults if not provided).
        
        Returns:
            CompactionResult with details about the compaction.
        
        Raises:
            ValueError: If thread is service-managed.
        """
        if self._service_thread_id is not None:
            raise ValueError(
                "Cannot compact service-managed threads. "
                "Compaction is only supported for client-managed threads."
            )
        
        if self._message_store is None:
            return CompactionResult(original_count=0, compacted_count=0)
        
        options = options or CompactionOptions()
        messages = await self._message_store.list_messages()
        original_count = len(messages)
        original_tokens = self.token_count
        
        compacted = await compactor.compact(messages, options)
        compacted_list = list(compacted)
        
        # Update message store
        self._message_store._messages = compacted_list
        
        # Reset accumulated usage - next response will provide fresh count
        self._accumulated_usage = None
        
        return CompactionResult(
            original_count=original_count,
            compacted_count=len(compacted_list),
            original_tokens=original_tokens,
            compacted_tokens=None,  # Known after next provider response
        )

4. ChatAgent Integration

# In python/packages/core/agent_framework/_chat_agent.py

from agent_framework.compaction import AutoCompactionConfig, USE_AGENT_DEFAULT


class ChatAgent(BaseAgent):
    def __init__(
        self,
        *,
        chat_client: ChatClientProtocol,
        name: str | None = None,
        instructions: str | None = None,
        tools: ToolSet = None,
        context_providers: ContextProvider | Sequence[ContextProvider] | None = None,
        middleware: AgentMiddleware | Sequence[AgentMiddleware] | None = None,
        compaction: AutoCompactionConfig | None = None,  # Agent-level default
        # ... existing parameters ...
        **kwargs,
    ):
        super().__init__(...)
        self._compaction = compaction
        # ... rest of init ...
    
    async def run(
        self,
        input: str | ChatMessage | Sequence[ChatMessage],
        *,
        thread: AgentThread | None = None,
        compaction: AutoCompactionConfig | None | object = USE_AGENT_DEFAULT,  # Per-run config
        **kwargs,
    ) -> AgentRunResponse:
        """
        Run the agent with optional auto-compaction.
        
        Args:
            input: The input message(s) to process.
            thread: The conversation thread.
            compaction: Compaction configuration for this run.
                - AutoCompactionConfig: Use this specific config
                - None: Disable compaction for this run
                - USE_AGENT_DEFAULT (default): Use agent's default config
            **kwargs: Additional arguments passed to chat client.
        
        Returns:
            AgentRunResponse with the agent's response.
        """
        # Resolve compaction config: run-level overrides agent-level
        effective_compaction = (
            self._compaction if compaction is USE_AGENT_DEFAULT else compaction
        )
        
        # Auto-compact before run if configured
        if effective_compaction and effective_compaction.trigger == "before_run":
            if thread and thread.token_count > effective_compaction.threshold:
                await thread.compact(
                    effective_compaction.compactor,
                    effective_compaction.options,
                )
        
        # Prepare and execute (existing logic)
        thread = thread or self.get_new_thread()
        thread_messages, chat_options = await self._prepare_thread_and_messages(
            thread, input, **kwargs
        )
        
        response = await self._chat_client.get_response(
            messages=thread_messages,
            **chat_options,
        )
        
        # Track usage from response
        if thread and response.usage_details:
            await thread.on_usage(response.usage_details)
        
        await self._notify_thread_of_new_messages(thread, input, response.messages)
        
        # Auto-compact after run if configured
        if effective_compaction and effective_compaction.trigger == "after_run":
            if thread and thread.token_count > effective_compaction.threshold:
                await thread.compact(
                    effective_compaction.compactor,
                    effective_compaction.options,
                )
        
        return AgentRunResponse(
            messages=response.messages,
            thread=thread,
            usage_details=response.usage_details,
        )
    
    async def run_stream(
        self,
        input: str | ChatMessage | Sequence[ChatMessage],
        *,
        thread: AgentThread | None = None,
        compaction: AutoCompactionConfig | None | object = USE_AGENT_DEFAULT,
        **kwargs,
    ) -> AsyncIterator[AgentRunResponseUpdate]:
        """
        Stream the agent's response with optional auto-compaction.
        
        Same compaction behavior as run() - checks threshold before/after
        based on configuration.
        """
        effective_compaction = (
            self._compaction if compaction is USE_AGENT_DEFAULT else compaction
        )
        
        # Auto-compact before run
        if effective_compaction and effective_compaction.trigger == "before_run":
            if thread and thread.token_count > effective_compaction.threshold:
                await thread.compact(
                    effective_compaction.compactor,
                    effective_compaction.options,
                )
        
        thread = thread or self.get_new_thread()
        thread_messages, chat_options = await self._prepare_thread_and_messages(
            thread, input, **kwargs
        )
        
        collected_messages = []
        usage_details = None
        
        async for update in self._chat_client.get_streaming_response(
            messages=thread_messages,
            **chat_options,
        ):
            if update.messages:
                collected_messages.extend(update.messages)
            if update.usage_details:
                usage_details = update.usage_details
            yield AgentRunResponseUpdate(...)
        
        # Track usage after stream completes
        if thread and usage_details:
            await thread.on_usage(usage_details)
        
        await self._notify_thread_of_new_messages(thread, input, collected_messages)
        
        # Auto-compact after run
        if effective_compaction and effective_compaction.trigger == "after_run":
            if thread and thread.token_count > effective_compaction.threshold:
                await thread.compact(
                    effective_compaction.compactor,
                    effective_compaction.options,
                )

6. Built-in Compactors

TruncationCompactor
# In python/packages/core/agent_framework/compaction/_truncation.py

from pydantic import Field


class TruncationOptions(CompactionOptions):
    """Options specific to truncation compaction."""
    
    preserve_recent: int = Field(
        default=2,
        description="Number of recent message pairs to always preserve"
    )
    preserve_system: bool = Field(
        default=True,
        description="Whether to preserve system messages"
    )
    strategy: Literal["aggressive", "moderate", "conservative"] = Field(
        default="moderate",
        description="How aggressively to truncate"
    )


class TruncationCompactor:
    """
    Simple FIFO truncation compactor.
    
    Drops oldest messages first while preserving system messages
    and recent conversation context.
    
    Example:
        compactor = TruncationCompactor()
        compactor = TruncationCompactor(TruncationOptions(preserve_recent=5))
    """
    
    def __init__(self, options: TruncationOptions | None = None):
        self.default_options = options or TruncationOptions()
    
    async def compact(
        self,
        messages: Sequence[ChatMessage],
        options: CompactionOptions,
    ) -> Sequence[ChatMessage]:
        opts = self._merge_options(options)
        
        if not messages:
            return messages
        
        # Separate system messages if preserving
        system_msgs: list[ChatMessage] = []
        other_msgs = list(messages)
        
        if opts.preserve_system:
            system_msgs = [m for m in messages if m.role == "system"]
            other_msgs = [m for m in messages if m.role != "system"]
        
        if not other_msgs:
            return messages
        
        # Always keep recent messages
        preserve = opts.preserve_recent * 2  # user + assistant pairs
        if preserve >= len(other_msgs):
            return messages
        
        protected = other_msgs[-preserve:]
        candidates = other_msgs[:-preserve]
        
        # Determine how many to keep based on strategy
        keep_ratio = {
            "aggressive": 0.25,
            "moderate": 0.5,
            "conservative": 0.75,
        }[opts.strategy]
        
        keep_count = int(len(candidates) * keep_ratio)
        kept = candidates[-keep_count:] if keep_count > 0 else []
        
        return system_msgs + kept + protected
    
    def _merge_options(self, options: CompactionOptions) -> TruncationOptions:
        if isinstance(options, TruncationOptions):
            return options
        return TruncationOptions(
            max_tokens=options.max_tokens,
            **self.default_options.model_dump(exclude={'max_tokens'})
        )
SummarizationCompactor
# In python/packages/core/agent_framework/compaction/_summarization.py


class SummarizationOptions(CompactionOptions):
    """Options specific to summarization compaction."""
    
    preserve_recent: int = Field(
        default=3,
        description="Number of recent message pairs to preserve unchanged"
    )
    summary_prompt: str = Field(
        default=(
            "Summarize this conversation concisely, preserving:\n"
            "- Key decisions made\n"
            "- Important context and requirements\n"
            "- Current task state and next steps"
        ),
        description="Prompt used for summarization"
    )
    summary_model_id: str | None = Field(
        default=None,
        description="Model to use for summarization (defaults to agent's model)"
    )


class SummarizationCompactor:
    """
    LLM-powered summarization compactor.
    
    Uses a chat client to summarize older messages into a condensed
    system message, preserving recent conversation context.
    
    Example:
        compactor = SummarizationCompactor(chat_client)
        compactor = SummarizationCompactor(
            chat_client,
            SummarizationOptions(preserve_recent=5)
        )
    """
    
    def __init__(
        self,
        chat_client: ChatClientProtocol,
        options: SummarizationOptions | None = None,
    ):
        self._chat_client = chat_client
        self.default_options = options or SummarizationOptions()
    
    async def compact(
        self,
        messages: Sequence[ChatMessage],
        options: CompactionOptions,
    ) -> Sequence[ChatMessage]:
        opts = self._merge_options(options)
        
        if not messages:
            return messages
        
        preserve = opts.preserve_recent * 2  # user + assistant pairs
        if preserve >= len(messages):
            return messages
        
        protected = list(messages[-preserve:])
        to_summarize = list(messages[:-preserve])
        
        if not to_summarize:
            return protected
        
        # Build conversation text for summarization
        conversation = "\n".join(
            f"{m.role}: {self._get_text(m)}"
            for m in to_summarize
        )
        
        # Call LLM for summary
        summary_request = [
            ChatMessage(
                role="system",
                contents=[TextContent(text=opts.summary_prompt)]
            ),
            ChatMessage(
                role="user",
                contents=[TextContent(text=conversation)]
            ),
        ]
        
        response = await self._chat_client.get_response(
            messages=summary_request,
            model_id=opts.summary_model_id,
        )
        
        summary_text = self._get_text(response.messages[0]) if response.messages else ""
        
        summary_message = ChatMessage(
            role="system",
            contents=[TextContent(
                text=f"[Conversation Summary]\n{summary_text}"
            )],
        )
        
        return [summary_message] + protected
    
    def _get_text(self, message: ChatMessage) -> str:
        return ' '.join(
            c.text for c in message.contents
            if hasattr(c, 'text')
        )
    
    def _merge_options(self, options: CompactionOptions) -> SummarizationOptions:
        if isinstance(options, SummarizationOptions):
            return options
        return SummarizationOptions(
            max_tokens=options.max_tokens,
            **self.default_options.model_dump(exclude={'max_tokens'})
        )
ManagedCompactor (Provider-Specific)
# In python/packages/core/agent_framework/compaction/_managed.py


class CompactionProviderProtocol(Protocol):
    """Protocol for provider-specific compaction endpoints."""
    
    async def compact(
        self,
        messages: Sequence[ChatMessage],
        options: CompactionOptions,
    ) -> Sequence[ChatMessage]:
        ...


class ManagedCompactor:
    """
    Provider-managed compactor.
    
    Delegates to provider-specific endpoints like OpenAI /responses/compact.
    
    Example:
        from agent_framework_openai import OpenAICompactionProvider
        
        provider = OpenAICompactionProvider(client, model_id="gpt-5.1-codex-max")
        compactor = ManagedCompactor(provider)
    """
    
    def __init__(self, provider: CompactionProviderProtocol):
        self._provider = provider
    
    async def compact(
        self,
        messages: Sequence[ChatMessage],
        options: CompactionOptions,
    ) -> Sequence[ChatMessage]:
        return await self._provider.compact(messages, options)

.NET Implementation

1. Auto-Compaction Configuration

// In dotnet/src/Microsoft.Agents.AI.Abstractions/Compaction/AutoCompactionConfig.cs

namespace Microsoft.Agents.AI;

/// 
/// Configuration for automatic thread compaction.
/// Can be set at agent-level (default) or per-run (override).
/// 
public class AutoCompactionConfig
{
    /// 
    /// The compactor to use for automatic compaction.
    /// 
    public required ICompactor Compactor { get; init; }
    
    /// 
    /// Token count that triggers automatic compaction.
    /// 
    public int Threshold { get; init; } = 100_000;
    
    /// 
    /// When to check and trigger compaction.
    /// 
    public CompactionTrigger Trigger { get; init; } = CompactionTrigger.BeforeRun;
    
    /// 
    /// Optional compactor-specific options.
    /// 
    public CompactionOptions? Options { get; init; }
}

public enum CompactionTrigger
{
    BeforeRun,
    AfterRun
}

/// 
/// Base compaction options.
/// 
public class CompactionOptions
{
    public int MaxTokens { get; init; } = 128_000;
    
    public virtual CompactionOptions Clone() => new()
    {
        MaxTokens = MaxTokens
    };
}

public record CompactionResult(
    int OriginalCount,
    int CompactedCount,
    int? OriginalTokens = null,
    int? CompactedTokens = null);

2. ICompactor Interface

namespace Microsoft.Agents.AI;

/// 
/// Interface for thread compaction strategies.
/// 
public interface ICompactor
{
    Task<IReadOnlyList> CompactAsync(
        IReadOnlyList messages,
        CompactionOptions options,
        CancellationToken cancellationToken = default);
}

3. Token Tracking on AgentThread

// In AgentThread.cs

public abstract class AgentThread
{
    private UsageDetails? _accumulatedUsage;
    
    /// 
    /// Gets the accumulated input token count (context window size).
    /// 
    public int TokenCount => _accumulatedUsage?.InputTokenCount ?? 0;
    
    /// 
    /// Gets the accumulated usage details.
    /// 
    public UsageDetails? UsageDetails => _accumulatedUsage;
    
    /// 
    /// Called after each agent run to track token usage.
    /// 
    public virtual void OnUsage(UsageDetails? usage)
    {
        if (usage == null) return;
        
        if (_accumulatedUsage == null)
        {
            _accumulatedUsage = new UsageDetails
            {
                InputTokenCount = usage.InputTokenCount,
                OutputTokenCount = usage.OutputTokenCount,
                TotalTokenCount = usage.TotalTokenCount,
            };
        }
        else
        {
            _accumulatedUsage = new UsageDetails
            {
                InputTokenCount = usage.InputTokenCount,
                OutputTokenCount = (_accumulatedUsage.OutputTokenCount ?? 0) + 
                                   (usage.OutputTokenCount ?? 0),
                TotalTokenCount = usage.TotalTokenCount,
            };
        }
    }
    
    /// 
    /// Compacts the thread's message history.
    /// 
    public virtual Task CompactAsync(
        ICompactor compactor,
        CompactionOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        return Task.FromResult(new CompactionResult(0, 0));
    }
    
    /// 
    /// Resets the accumulated usage tracking.
    /// 
    protected void ResetUsage() => _accumulatedUsage = null;
}

4. Run Options Extension

// In ChatClientAgentOptions.cs - Agent-level default
public class ChatClientAgentOptions
{
    // ... existing properties ...
    
    /// 
    /// Default compaction configuration for all runs.
    /// Can be overridden per-run via ChatClientAgentRunOptions.
    /// 
    public AutoCompactionConfig? Compaction { get; init; }
}

// In ChatClientAgentRunOptions.cs - Per-run override
public class ChatClientAgentRunOptions : AgentRunOptions
{
    // ... existing properties ...
    
    /// 
    /// Compaction configuration for this run.
    /// Set to override agent default, or null to disable compaction.
    /// 
    public AutoCompactionConfig? Compaction { get; init; }
    
    /// 
    /// Whether to use agent's default compaction config.
    /// When false, Compaction property is used (even if null).
    /// 
    public bool UseDefaultCompaction { get; init; } = true;
}

5. ChatClientAgent with Run-Level Compaction

// In ChatClientAgent.cs

public class ChatClientAgent : AIAgent
{
    private readonly ChatClientAgentOptions _options;
    
    private async Task RunCoreAsync(
        IEnumerable? messages,
        ChatClientAgentThread thread,
        ChatClientAgentRunOptions? runOptions,
        CancellationToken cancellationToken)
    {
        // Resolve effective compaction config: run-level overrides agent-level
        var effectiveCompaction = ResolveCompactionConfig(runOptions);
        
        // Auto-compaction before run
        if (effectiveCompaction is { Trigger: CompactionTrigger.BeforeRun })
        {
            if (thread.TokenCount > effectiveCompaction.Threshold)
            {
                await thread.CompactAsync(
                    effectiveCompaction.Compactor,
                    effectiveCompaction.Options,
                    cancellationToken);
            }
        }
        
        // Prepare messages (existing logic)
        var (inputMessagesForChatClient, chatOptions) = await PrepareThreadAndMessagesAsync(
            thread, messages, runOptions, cancellationToken);
        
        // Call chat client
        var chatResponse = await chatClient.GetResponseAsync(
            inputMessagesForChatClient,
            chatOptions,
            cancellationToken);
        
        // Track usage from response
        if (chatResponse.Usage != null)
        {
            thread.OnUsage(chatResponse.Usage);
        }
        
        // Update message store (existing logic)
        await NotifyMessageStoreOfNewMessagesAsync(thread, messages, chatResponse.Messages, cancellationToken);
        
        // Auto-compaction after run
        if (effectiveCompaction is { Trigger: CompactionTrigger.AfterRun })
        {
            if (thread.TokenCount > effectiveCompaction.Threshold)
            {
                await thread.CompactAsync(
                    effectiveCompaction.Compactor,
                    effectiveCompaction.Options,
                    cancellationToken);
            }
        }
        
        return new AgentRunResponse(chatResponse.Messages, thread, chatResponse.Usage);
    }
    
    private AutoCompactionConfig? ResolveCompactionConfig(ChatClientAgentRunOptions? runOptions)
    {
        // If run options explicitly set compaction (even to null), use that
        if (runOptions != null && !runOptions.UseDefaultCompaction)
        {
            return runOptions.Compaction;
        }
        
        // If run options provide compaction, use it
        if (runOptions?.Compaction != null)
        {
            return runOptions.Compaction;
        }
        
        // Fall back to agent default
        return _options.Compaction;
    }
}

6. Built-in Compactors

// TruncationCompactor.cs
public class TruncationCompactor : ICompactor
{
    private readonly TruncationOptions _defaultOptions;
    
    public TruncationCompactor(TruncationOptions? options = null)
    {
        _defaultOptions = options ?? new TruncationOptions();
    }
    
    public Task<IReadOnlyList> CompactAsync(
        IReadOnlyList messages,
        CompactionOptions options,
        CancellationToken cancellationToken = default)
    {
        var opts = MergeOptions(options);
        // ... implementation ...
    }
}

public class TruncationOptions : CompactionOptions
{
    public int PreserveRecent { get; init; } = 2;
    public bool PreserveSystem { get; init; } = true;
    public TruncationStrategy Strategy { get; init; } = TruncationStrategy.Moderate;
}

public enum TruncationStrategy
{
    Aggressive,   // Keep 25% of candidates
    Moderate,     // Keep 50% of candidates
    Conservative  // Keep 75% of candidates
}

// SummarizationCompactor.cs
public class SummarizationCompactor : ICompactor
{
    private readonly IChatClient _chatClient;
    private readonly SummarizationOptions _defaultOptions;
    
    public SummarizationCompactor(
        IChatClient chatClient,
        SummarizationOptions? options = null)
    {
        _chatClient = chatClient;
        _defaultOptions = options ?? new SummarizationOptions();
    }
    
    // ... implementation ...
}

public class SummarizationOptions : CompactionOptions
{
    public int PreserveRecent { get; init; } = 3;
    public string SummaryPrompt { get; init; } = "Summarize concisely...";
    public string? SummaryModelId { get; init; }
}

Usage Examples

Python: Explicit Opt-In Per-Run

from agent_framework import ChatAgent
from agent_framework.compaction import (
    AutoCompactionConfig,
    TruncationCompactor,
    TruncationOptions,
    SummarizationCompactor,
)

client = OpenAIChatClient()
agent = ChatAgent(chat_client=client, name="Assistant")
thread = agent.get_new_thread()

# Default: No compaction
async for event in agent.run_stream("Hello!", thread=thread):
    print(event.content, end="")  # No compaction happens

# Explicitly enable compaction for this run
async for event in agent.run_stream(
    "Let's have a long conversation...",
    thread=thread,
    compaction=AutoCompactionConfig(
        compactor=TruncationCompactor(),
        threshold=100_000,
    )
):
    print(event.content, end="")

# Different compaction strategy for different runs
async for event in agent.run_stream(
    "Summarize our conversation",
    thread=thread,
    compaction=AutoCompactionConfig(
        compactor=SummarizationCompactor(client),
        threshold=50_000,
    )
):
    print(event.content, end="")

Python: Agent-Level Defaults (Convenience)

# Set agent-level default for convenience
agent = ChatAgent(
    chat_client=client,
    name="Assistant",
    compaction=AutoCompactionConfig(
        compactor=TruncationCompactor(),
        threshold=100_000,
    )
)

thread = agent.get_new_thread()

# Uses agent default (compaction enabled)
async for event in agent.run_stream("Hello!", thread=thread):
    print(event.content, end="")

# Override with different config for this run
async for event in agent.run_stream(
    "Let's summarize",
    thread=thread,
    compaction=AutoCompactionConfig(
        compactor=SummarizationCompactor(client),
        threshold=80_000,
    )
):
    print(event.content, end="")

# Disable compaction for this specific run
async for event in agent.run_stream(
    "Quick follow-up",
    thread=thread,
    compaction=None,  # Explicitly disable
):
    print(event.content, end="")

# Token count is always tracked (even without compaction)
print(f"Current tokens: {thread.token_count}")

.NET: Explicit Opt-In Per-Run

using Microsoft.Agents.Framework.Compaction;

var agent = new ChatClientAgent(chatClient, new ChatClientAgentOptions
{
    Instructions = "You are a helpful assistant."
});

var thread = agent.GetNewThread();

// Default: No compaction
await foreach (var msg in agent.RunStreamingAsync("Hello!", thread))
    Console.Write(msg.Content);  // No compaction happens

// Explicitly enable compaction for this run
await foreach (var msg in agent.RunStreamingAsync(
    "Let's have a long conversation...",
    thread,
    new ChatClientAgentRunOptions
    {
        Compaction = new AutoCompactionConfig
        {
            Compactor = new TruncationCompactor(),
            Threshold = 100_000
        }
    }))
{
    Console.Write(msg.Content);
}

// Different config for this run
await foreach (var msg in agent.RunStreamingAsync(
    "Summarize please",
    thread,
    new ChatClientAgentRunOptions
    {
        Compaction = new AutoCompactionConfig
        {
            Compactor = new SummarizationCompactor(chatClient),
            Threshold = 50_000
        }
    }))
{
    Console.Write(msg.Content);
}

.NET: Agent-Level Defaults (Convenience)

// Set agent-level default for convenience
var agent = new ChatClientAgent(chatClient, new ChatClientAgentOptions
{
    Instructions = "You are a helpful assistant.",
    Compaction = new AutoCompactionConfig
    {
        Compactor = new TruncationCompactor(),
        Threshold = 100_000
    }
});

var thread = agent.GetNewThread();

// Uses agent default (compaction enabled)
await foreach (var msg in agent.RunStreamingAsync("Hello!", thread))
    Console.Write(msg.Content);

// Override for this run
await foreach (var msg in agent.RunStreamingAsync(
    "Summarize",
    thread,
    new ChatClientAgentRunOptions
    {
        Compaction = new AutoCompactionConfig
        {
            Compactor = new SummarizationCompactor(chatClient),
            Threshold = 80_000
        }
    }))
{
    Console.Write(msg.Content);
}

// Disable compaction for this run (even though agent has default)
await foreach (var msg in agent.RunStreamingAsync(
    "Quick question",
    thread,
    new ChatClientAgentRunOptions
    {
        UseDefaultCompaction = false  // Explicitly disable
    }))
{
    Console.Write(msg.Content);
}

Console.WriteLine($"Current tokens: {thread.TokenCount}");

Custom Compactor Options

# Truncation with custom settings
compaction = AutoCompactionConfig(
    compactor=TruncationCompactor(
        TruncationOptions(
            preserve_recent=5,        # Keep last 5 message pairs
            preserve_system=True,     # Always keep system messages
            strategy="conservative",  # Keep 75% of old messages
        )
    ),
    threshold=100_000,
    trigger="before_run",
)

# Summarization with custom prompt
compaction = AutoCompactionConfig(
    compactor=SummarizationCompactor(
        client,
        SummarizationOptions(
            preserve_recent=3,
            summary_prompt="Summarize key decisions and current task state.",
            summary_model_id="gpt-4o-mini",  # Use cheaper model for summaries
        )
    ),
    threshold=80_000,
)

Manual Compaction (Always Available)

# Without auto-compaction, you can still compact manually
agent = ChatAgent(chat_client=client, name="Assistant")  # No compaction config
thread = agent.get_new_thread()

# Run conversations (no auto-compaction)
response = await agent.run("Hello!", thread=thread)
response = await agent.run("Tell me more", thread=thread)

# Token count is still tracked automatically
if thread.usage_details:
    print(f"Input tokens: {thread.usage_details.input_token_count}")
    print(f"Output tokens: {thread.usage_details.output_token_count}")

# Manually compact when you decide
if thread.token_count > 100_000:
    result = await thread.compact(TruncationCompactor())
    print(f"Compacted from {result.original_count} to {result.compacted_count} messages")

OpenAI /compact Integration

# In agent_framework_openai/compaction.py

from agent_framework.compaction import ManagedCompactor, CompactionProviderProtocol


class OpenAICompactionProvider:
    """OpenAI /responses/compact endpoint integration."""
    
    def __init__(self, client, model_id: str = "gpt-5.1-codex-max"):
        self._client = client
        self._model_id = model_id
    
    async def compact(
        self,
        messages: Sequence[ChatMessage],
        options: CompactionOptions,
    ) -> Sequence[ChatMessage]:
        response = await self._client.responses.compact(
            model=self._model_id,
            input=self._to_openai_format(messages),
        )
        return self._from_response(response)


# Usage
agent = ChatAgent(
    chat_client=OpenAIChatClient(),
    compaction=AutoCompactionConfig(
        compactor=ManagedCompactor(
            OpenAICompactionProvider(client, "gpt-5.1-codex-max")
        ),
        threshold=100_000,
    )
)

Changes Summary

Component Change
ChatAgent.run() / run_stream() Add optional compaction parameter (default: None = no compaction)
ChatAgent.__init__ Add optional compaction parameter for agent-level default (default: None)
AgentThread Add token_count, usage_details, on_usage(), compact()
ChatClientAgentRunOptions (.NET) Add Compaction property (default: null = no compaction)
ChatClientAgentOptions (.NET) Add Compaction property for agent-level default (default: null)
AutoCompactionConfig Configuration class for enabling auto-compaction
CompactorProtocol / ICompactor Protocol/interface for compaction strategies
CompactionOptions Base options class
TruncationCompactor Built-in FIFO truncation
SummarizationCompactor Built-in LLM summarization
ManagedCompactor Provider-specific delegation

Note: All compaction parameters default to None/null. Users must explicitly provide an AutoCompactionConfig to enable compaction.


Configuration Precedence

Run-level compaction config (highest priority)
    ↓ (if not provided)
Agent-level compaction config
    ↓ (if not provided)
No compaction (default)

Python:

# Explicit opt-in at run level
agent.run("Hi", compaction=AutoCompactionConfig(...))  # Compaction enabled

# No compaction parameter = no compaction
agent.run("Hi")  # No compaction (default)

# If agent has default, it's used
agent = ChatAgent(..., compaction=default_config)
agent.run("Hi")  # Uses agent default

# Explicitly disable even if agent has default
agent.run("Hi", compaction=None)  # No compaction

C#:

// Explicit opt-in at run level
agent.RunAsync("Hi", thread, new() { Compaction = config });  // Compaction enabled

// No run options = no compaction
agent.RunAsync("Hi", thread);  // No compaction (default)

// If agent has default, it's used
var agent = new ChatClientAgent(..., new() { Compaction = defaultConfig });
agent.RunAsync("Hi", thread);  // Uses agent default

// Explicitly disable even if agent has default
agent.RunAsync("Hi", thread, new() { UseDefaultCompaction = false });  // No compaction

Options Hierarchy

CompactionOptions (base)
└── max_tokens: int = 128_000

TruncationOptions(CompactionOptions)
├── preserve_recent: int = 2
├── preserve_system: bool = True
└── strategy: "aggressive" | "moderate" | "conservative"

SummarizationOptions(CompactionOptions)
├── preserve_recent: int = 3
├── summary_prompt: str
└── summary_model_id: str | None

Token Tracking

The agent tracks actual token usage from provider responses:

# After each agent.run() or agent.run_stream():
# 1. ChatResponse includes usage_details from provider
# 2. Agent calls thread.on_usage(response.usage_details)
# 3. Thread stores input_token_count (= context window size)
# 4. If compaction configured, agent checks threshold and compacts if needed

# Access the tracked count:
thread.token_count      # -> int (current context size in tokens)
thread.usage_details    # -> UsageDetails with full breakdown

No heuristics — uses actual counts from providers.


Why This Design?

Explicit Opt-In = No Surprises

  • Compaction is disabled by default—no unexpected behavior
  • Users consciously choose when and how to compact
  • Existing code continues to work without changes

Run-Level Configuration = Maximum Flexibility

  • Different compaction strategies for different conversations
  • Enable compaction only when needed
  • Use aggressive compaction when context is filling up
  • Use summarization for end-of-session cleanup

Follows Framework Patterns

  • Same pattern as ChatOptions (agent default + run override)
  • Uses protocols for extensibility (like ChatClientProtocol)
  • Options classes for configuration (like ChatOptions)

Minimal API Surface

  • One config class (AutoCompactionConfig)
  • One protocol (CompactorProtocol)
  • Built-in compactors for common cases

Zero Friction Once Opted In

# Enable compaction - just add one parameter
async for event in agent.run_stream(
    "Hello!",
    thread=thread,
    compaction=AutoCompactionConfig(compactor=TruncationCompactor())
):
    ...
# That's it - no manual checks needed from here on

Full Control Available

  • Manual thread.compact() still available
  • Per-run overrides
  • Custom compactors via protocol
  • Access to raw token counts

References

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions