-
Notifications
You must be signed in to change notification settings - Fork 862
Description
Python + .NET Proposal: Auto-Compaction First-Party Support in AgentThread
Problem
Long-running conversations inevitably exceed context window limits, causing failures or degraded performance. Currently, developers must manually track token counts and implement their own compaction logic—a tedious, error-prone process that every user of the framework has to solve independently.
Context
OpenAI recently made their gpt-5.1-codex-max model publicly available using the responses endpoint with built-in auto truncation and a dedicated /responses/compact endpoint.
This proposal adds native auto-compaction capability to AgentThread. Compaction is disabled by default—users must explicitly opt-in. Once enabled, compaction happens automatically with no manual token checks required.
Design Philosophy
Core Principle: Explicit Opt-In, Then Forget
Compaction is disabled by default. Users must explicitly enable it—but once enabled, it works automatically with no manual token checks:
# Python - Explicitly enable compaction per-run
async for event in agent.run_stream(
"Hello!",
thread=thread,
compaction=AutoCompactionConfig( # Explicit opt-in
compactor=TruncationCompactor(),
threshold=100_000,
)
):
print(event.content, end="")
# No compaction parameter = no compaction (default)
async for event in agent.run_stream("Hello!", thread=thread):
... # No compaction happens
# Or set agent-level default for convenience, override per-run
agent = ChatAgent(chat_client=client, compaction=default_config)
async for event in agent.run_stream("Hello!", thread=thread): # Uses agent default
...
async for event in agent.run_stream("Hello!", thread=thread, compaction=None): # Disable
...// .NET - Explicitly enable compaction per-run
await foreach (var msg in agent.RunStreamingAsync(
"Hello!",
thread,
new ChatClientAgentRunOptions
{
Compaction = new AutoCompactionConfig // Explicit opt-in
{
Compactor = new TruncationCompactor(),
Threshold = 100_000
}
}))
{
Console.Write(msg.Content);
}
// No options or no Compaction property = no compaction (default)
await foreach (var msg in agent.RunStreamingAsync("Hello!", thread))
Console.Write(msg.Content); // No compaction happens
// Or set agent-level default for convenience
var agent = new ChatClientAgent(chatClient, new ChatClientAgentOptions
{
Compaction = defaultConfig // Default for all runs when opted in
});Design Principles
- Explicit opt-in — Compaction is disabled by default; users must enable it
- Zero manual checks — Once enabled, no
if thread.token_count > thresholdrequired - Run-level configuration — Configure compaction per-run for maximum flexibility
- Agent-level defaults — Optional convenience defaults that can be overridden per-run
- Actual token tracking — Uses
UsageDetailsfrom provider responses (no heuristics) - Follows existing patterns — Same as ChatOptions (agent default + run override)
Architecture Overview
┌─────────────────────────────────────────────────────────────────┐
│ agent.run() / run_stream() │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Compaction Check (before LLM call) │ │
│ │ ┌─────────────────────────────────────────────────┐ │ │
│ │ │ config = run_compaction ?? agent_compaction │ │ │
│ │ │ if config and thread.token_count > threshold: │ │ │
│ │ │ await thread.compact(config.compactor) │ │ │
│ │ └─────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ [LLM Call + Response] │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Usage Tracking (after LLM call) │ │
│ │ thread.on_usage(response.usage_details) │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Proposed API
Python Implementation
1. Auto-Compaction Configuration
# In python/packages/core/agent_framework/_compaction.py
from pydantic import BaseModel, Field
from typing import Protocol, Sequence, runtime_checkable, Literal
class CompactionOptions(BaseModel):
"""Base compaction options. Compactors extend this with their own options."""
max_tokens: int = Field(
default=128_000,
description="Maximum context window size for the model"
)
class AutoCompactionConfig(BaseModel):
"""
Configuration for automatic compaction.
Pass to agent.run() or agent.run_stream() to enable automatic
compaction when the token threshold is exceeded.
Can also be set as agent-level default via ChatAgent constructor.
"""
compactor: "CompactorProtocol"
"""The compactor to use for automatic compaction."""
threshold: int = Field(
default=100_000,
description="Token count that triggers automatic compaction"
)
trigger: Literal["before_run", "after_run"] = Field(
default="before_run",
description="When to check and trigger compaction"
)
options: CompactionOptions | None = Field(
default=None,
description="Optional compactor-specific options"
)
class Config:
arbitrary_types_allowed = True
class CompactionResult(BaseModel):
"""Result of a compaction operation."""
original_count: int
compacted_count: int
original_tokens: int | None = None
compacted_tokens: int | None = None
# Sentinel value for "use agent default"
USE_AGENT_DEFAULT = object()2. Compactor Protocol
@runtime_checkable
class CompactorProtocol(Protocol):
"""
Protocol for thread compaction strategies.
Follows framework patterns: ChatMessageStoreProtocol, ChatClientProtocol, etc.
"""
async def compact(
self,
messages: Sequence["ChatMessage"],
options: CompactionOptions,
) -> Sequence["ChatMessage"]:
"""Compact the given messages."""
...3. Token Usage Tracking on AgentThread
# In python/packages/core/agent_framework/_threads.py
from agent_framework._types import UsageDetails
class AgentThread:
def __init__(
self,
*,
service_thread_id: str | None = None,
message_store: ChatMessageStoreProtocol | None = None,
context_provider: AggregateContextProvider | None = None,
) -> None:
# ... existing ...
self._accumulated_usage: UsageDetails | None = None
@property
def token_count(self) -> int:
"""
Returns the accumulated input token count for this thread.
This represents the approximate context window size based on
actual usage reported by the provider. Returns 0 if no usage
has been tracked yet.
"""
if self._accumulated_usage is None:
return 0
return self._accumulated_usage.input_token_count or 0
@property
def usage_details(self) -> UsageDetails | None:
"""Returns the accumulated usage details for this thread."""
return self._accumulated_usage
async def on_usage(self, usage: UsageDetails) -> None:
"""
Called after each agent run to track token usage.
The input_token_count from the latest response represents
the current context window size.
"""
if usage is None:
return
if self._accumulated_usage is None:
self._accumulated_usage = UsageDetails(
input_token_count=usage.input_token_count or 0,
output_token_count=usage.output_token_count or 0,
total_token_count=usage.total_token_count or 0,
)
else:
# Latest input_token_count = current context size
# Output tokens accumulate across the conversation
self._accumulated_usage = UsageDetails(
input_token_count=usage.input_token_count or 0,
output_token_count=(
(self._accumulated_usage.output_token_count or 0) +
(usage.output_token_count or 0)
),
total_token_count=usage.total_token_count or 0,
)
async def compact(
self,
compactor: CompactorProtocol,
options: CompactionOptions | None = None,
) -> CompactionResult:
"""
Manually compact the thread's message history.
Note: With auto-compaction enabled, you typically don't need
to call this method directly.
Args:
compactor: Any object implementing CompactorProtocol.
options: Compaction options (uses defaults if not provided).
Returns:
CompactionResult with details about the compaction.
Raises:
ValueError: If thread is service-managed.
"""
if self._service_thread_id is not None:
raise ValueError(
"Cannot compact service-managed threads. "
"Compaction is only supported for client-managed threads."
)
if self._message_store is None:
return CompactionResult(original_count=0, compacted_count=0)
options = options or CompactionOptions()
messages = await self._message_store.list_messages()
original_count = len(messages)
original_tokens = self.token_count
compacted = await compactor.compact(messages, options)
compacted_list = list(compacted)
# Update message store
self._message_store._messages = compacted_list
# Reset accumulated usage - next response will provide fresh count
self._accumulated_usage = None
return CompactionResult(
original_count=original_count,
compacted_count=len(compacted_list),
original_tokens=original_tokens,
compacted_tokens=None, # Known after next provider response
)4. ChatAgent Integration
# In python/packages/core/agent_framework/_chat_agent.py
from agent_framework.compaction import AutoCompactionConfig, USE_AGENT_DEFAULT
class ChatAgent(BaseAgent):
def __init__(
self,
*,
chat_client: ChatClientProtocol,
name: str | None = None,
instructions: str | None = None,
tools: ToolSet = None,
context_providers: ContextProvider | Sequence[ContextProvider] | None = None,
middleware: AgentMiddleware | Sequence[AgentMiddleware] | None = None,
compaction: AutoCompactionConfig | None = None, # Agent-level default
# ... existing parameters ...
**kwargs,
):
super().__init__(...)
self._compaction = compaction
# ... rest of init ...
async def run(
self,
input: str | ChatMessage | Sequence[ChatMessage],
*,
thread: AgentThread | None = None,
compaction: AutoCompactionConfig | None | object = USE_AGENT_DEFAULT, # Per-run config
**kwargs,
) -> AgentRunResponse:
"""
Run the agent with optional auto-compaction.
Args:
input: The input message(s) to process.
thread: The conversation thread.
compaction: Compaction configuration for this run.
- AutoCompactionConfig: Use this specific config
- None: Disable compaction for this run
- USE_AGENT_DEFAULT (default): Use agent's default config
**kwargs: Additional arguments passed to chat client.
Returns:
AgentRunResponse with the agent's response.
"""
# Resolve compaction config: run-level overrides agent-level
effective_compaction = (
self._compaction if compaction is USE_AGENT_DEFAULT else compaction
)
# Auto-compact before run if configured
if effective_compaction and effective_compaction.trigger == "before_run":
if thread and thread.token_count > effective_compaction.threshold:
await thread.compact(
effective_compaction.compactor,
effective_compaction.options,
)
# Prepare and execute (existing logic)
thread = thread or self.get_new_thread()
thread_messages, chat_options = await self._prepare_thread_and_messages(
thread, input, **kwargs
)
response = await self._chat_client.get_response(
messages=thread_messages,
**chat_options,
)
# Track usage from response
if thread and response.usage_details:
await thread.on_usage(response.usage_details)
await self._notify_thread_of_new_messages(thread, input, response.messages)
# Auto-compact after run if configured
if effective_compaction and effective_compaction.trigger == "after_run":
if thread and thread.token_count > effective_compaction.threshold:
await thread.compact(
effective_compaction.compactor,
effective_compaction.options,
)
return AgentRunResponse(
messages=response.messages,
thread=thread,
usage_details=response.usage_details,
)
async def run_stream(
self,
input: str | ChatMessage | Sequence[ChatMessage],
*,
thread: AgentThread | None = None,
compaction: AutoCompactionConfig | None | object = USE_AGENT_DEFAULT,
**kwargs,
) -> AsyncIterator[AgentRunResponseUpdate]:
"""
Stream the agent's response with optional auto-compaction.
Same compaction behavior as run() - checks threshold before/after
based on configuration.
"""
effective_compaction = (
self._compaction if compaction is USE_AGENT_DEFAULT else compaction
)
# Auto-compact before run
if effective_compaction and effective_compaction.trigger == "before_run":
if thread and thread.token_count > effective_compaction.threshold:
await thread.compact(
effective_compaction.compactor,
effective_compaction.options,
)
thread = thread or self.get_new_thread()
thread_messages, chat_options = await self._prepare_thread_and_messages(
thread, input, **kwargs
)
collected_messages = []
usage_details = None
async for update in self._chat_client.get_streaming_response(
messages=thread_messages,
**chat_options,
):
if update.messages:
collected_messages.extend(update.messages)
if update.usage_details:
usage_details = update.usage_details
yield AgentRunResponseUpdate(...)
# Track usage after stream completes
if thread and usage_details:
await thread.on_usage(usage_details)
await self._notify_thread_of_new_messages(thread, input, collected_messages)
# Auto-compact after run
if effective_compaction and effective_compaction.trigger == "after_run":
if thread and thread.token_count > effective_compaction.threshold:
await thread.compact(
effective_compaction.compactor,
effective_compaction.options,
)6. Built-in Compactors
TruncationCompactor
# In python/packages/core/agent_framework/compaction/_truncation.py
from pydantic import Field
class TruncationOptions(CompactionOptions):
"""Options specific to truncation compaction."""
preserve_recent: int = Field(
default=2,
description="Number of recent message pairs to always preserve"
)
preserve_system: bool = Field(
default=True,
description="Whether to preserve system messages"
)
strategy: Literal["aggressive", "moderate", "conservative"] = Field(
default="moderate",
description="How aggressively to truncate"
)
class TruncationCompactor:
"""
Simple FIFO truncation compactor.
Drops oldest messages first while preserving system messages
and recent conversation context.
Example:
compactor = TruncationCompactor()
compactor = TruncationCompactor(TruncationOptions(preserve_recent=5))
"""
def __init__(self, options: TruncationOptions | None = None):
self.default_options = options or TruncationOptions()
async def compact(
self,
messages: Sequence[ChatMessage],
options: CompactionOptions,
) -> Sequence[ChatMessage]:
opts = self._merge_options(options)
if not messages:
return messages
# Separate system messages if preserving
system_msgs: list[ChatMessage] = []
other_msgs = list(messages)
if opts.preserve_system:
system_msgs = [m for m in messages if m.role == "system"]
other_msgs = [m for m in messages if m.role != "system"]
if not other_msgs:
return messages
# Always keep recent messages
preserve = opts.preserve_recent * 2 # user + assistant pairs
if preserve >= len(other_msgs):
return messages
protected = other_msgs[-preserve:]
candidates = other_msgs[:-preserve]
# Determine how many to keep based on strategy
keep_ratio = {
"aggressive": 0.25,
"moderate": 0.5,
"conservative": 0.75,
}[opts.strategy]
keep_count = int(len(candidates) * keep_ratio)
kept = candidates[-keep_count:] if keep_count > 0 else []
return system_msgs + kept + protected
def _merge_options(self, options: CompactionOptions) -> TruncationOptions:
if isinstance(options, TruncationOptions):
return options
return TruncationOptions(
max_tokens=options.max_tokens,
**self.default_options.model_dump(exclude={'max_tokens'})
)SummarizationCompactor
# In python/packages/core/agent_framework/compaction/_summarization.py
class SummarizationOptions(CompactionOptions):
"""Options specific to summarization compaction."""
preserve_recent: int = Field(
default=3,
description="Number of recent message pairs to preserve unchanged"
)
summary_prompt: str = Field(
default=(
"Summarize this conversation concisely, preserving:\n"
"- Key decisions made\n"
"- Important context and requirements\n"
"- Current task state and next steps"
),
description="Prompt used for summarization"
)
summary_model_id: str | None = Field(
default=None,
description="Model to use for summarization (defaults to agent's model)"
)
class SummarizationCompactor:
"""
LLM-powered summarization compactor.
Uses a chat client to summarize older messages into a condensed
system message, preserving recent conversation context.
Example:
compactor = SummarizationCompactor(chat_client)
compactor = SummarizationCompactor(
chat_client,
SummarizationOptions(preserve_recent=5)
)
"""
def __init__(
self,
chat_client: ChatClientProtocol,
options: SummarizationOptions | None = None,
):
self._chat_client = chat_client
self.default_options = options or SummarizationOptions()
async def compact(
self,
messages: Sequence[ChatMessage],
options: CompactionOptions,
) -> Sequence[ChatMessage]:
opts = self._merge_options(options)
if not messages:
return messages
preserve = opts.preserve_recent * 2 # user + assistant pairs
if preserve >= len(messages):
return messages
protected = list(messages[-preserve:])
to_summarize = list(messages[:-preserve])
if not to_summarize:
return protected
# Build conversation text for summarization
conversation = "\n".join(
f"{m.role}: {self._get_text(m)}"
for m in to_summarize
)
# Call LLM for summary
summary_request = [
ChatMessage(
role="system",
contents=[TextContent(text=opts.summary_prompt)]
),
ChatMessage(
role="user",
contents=[TextContent(text=conversation)]
),
]
response = await self._chat_client.get_response(
messages=summary_request,
model_id=opts.summary_model_id,
)
summary_text = self._get_text(response.messages[0]) if response.messages else ""
summary_message = ChatMessage(
role="system",
contents=[TextContent(
text=f"[Conversation Summary]\n{summary_text}"
)],
)
return [summary_message] + protected
def _get_text(self, message: ChatMessage) -> str:
return ' '.join(
c.text for c in message.contents
if hasattr(c, 'text')
)
def _merge_options(self, options: CompactionOptions) -> SummarizationOptions:
if isinstance(options, SummarizationOptions):
return options
return SummarizationOptions(
max_tokens=options.max_tokens,
**self.default_options.model_dump(exclude={'max_tokens'})
)ManagedCompactor (Provider-Specific)
# In python/packages/core/agent_framework/compaction/_managed.py
class CompactionProviderProtocol(Protocol):
"""Protocol for provider-specific compaction endpoints."""
async def compact(
self,
messages: Sequence[ChatMessage],
options: CompactionOptions,
) -> Sequence[ChatMessage]:
...
class ManagedCompactor:
"""
Provider-managed compactor.
Delegates to provider-specific endpoints like OpenAI /responses/compact.
Example:
from agent_framework_openai import OpenAICompactionProvider
provider = OpenAICompactionProvider(client, model_id="gpt-5.1-codex-max")
compactor = ManagedCompactor(provider)
"""
def __init__(self, provider: CompactionProviderProtocol):
self._provider = provider
async def compact(
self,
messages: Sequence[ChatMessage],
options: CompactionOptions,
) -> Sequence[ChatMessage]:
return await self._provider.compact(messages, options).NET Implementation
1. Auto-Compaction Configuration
// In dotnet/src/Microsoft.Agents.AI.Abstractions/Compaction/AutoCompactionConfig.cs
namespace Microsoft.Agents.AI;
///
/// Configuration for automatic thread compaction.
/// Can be set at agent-level (default) or per-run (override).
///
public class AutoCompactionConfig
{
///
/// The compactor to use for automatic compaction.
///
public required ICompactor Compactor { get; init; }
///
/// Token count that triggers automatic compaction.
///
public int Threshold { get; init; } = 100_000;
///
/// When to check and trigger compaction.
///
public CompactionTrigger Trigger { get; init; } = CompactionTrigger.BeforeRun;
///
/// Optional compactor-specific options.
///
public CompactionOptions? Options { get; init; }
}
public enum CompactionTrigger
{
BeforeRun,
AfterRun
}
///
/// Base compaction options.
///
public class CompactionOptions
{
public int MaxTokens { get; init; } = 128_000;
public virtual CompactionOptions Clone() => new()
{
MaxTokens = MaxTokens
};
}
public record CompactionResult(
int OriginalCount,
int CompactedCount,
int? OriginalTokens = null,
int? CompactedTokens = null);2. ICompactor Interface
namespace Microsoft.Agents.AI;
///
/// Interface for thread compaction strategies.
///
public interface ICompactor
{
Task<IReadOnlyList> CompactAsync(
IReadOnlyList messages,
CompactionOptions options,
CancellationToken cancellationToken = default);
}3. Token Tracking on AgentThread
// In AgentThread.cs
public abstract class AgentThread
{
private UsageDetails? _accumulatedUsage;
///
/// Gets the accumulated input token count (context window size).
///
public int TokenCount => _accumulatedUsage?.InputTokenCount ?? 0;
///
/// Gets the accumulated usage details.
///
public UsageDetails? UsageDetails => _accumulatedUsage;
///
/// Called after each agent run to track token usage.
///
public virtual void OnUsage(UsageDetails? usage)
{
if (usage == null) return;
if (_accumulatedUsage == null)
{
_accumulatedUsage = new UsageDetails
{
InputTokenCount = usage.InputTokenCount,
OutputTokenCount = usage.OutputTokenCount,
TotalTokenCount = usage.TotalTokenCount,
};
}
else
{
_accumulatedUsage = new UsageDetails
{
InputTokenCount = usage.InputTokenCount,
OutputTokenCount = (_accumulatedUsage.OutputTokenCount ?? 0) +
(usage.OutputTokenCount ?? 0),
TotalTokenCount = usage.TotalTokenCount,
};
}
}
///
/// Compacts the thread's message history.
///
public virtual Task CompactAsync(
ICompactor compactor,
CompactionOptions? options = null,
CancellationToken cancellationToken = default)
{
return Task.FromResult(new CompactionResult(0, 0));
}
///
/// Resets the accumulated usage tracking.
///
protected void ResetUsage() => _accumulatedUsage = null;
}4. Run Options Extension
// In ChatClientAgentOptions.cs - Agent-level default
public class ChatClientAgentOptions
{
// ... existing properties ...
///
/// Default compaction configuration for all runs.
/// Can be overridden per-run via ChatClientAgentRunOptions.
///
public AutoCompactionConfig? Compaction { get; init; }
}
// In ChatClientAgentRunOptions.cs - Per-run override
public class ChatClientAgentRunOptions : AgentRunOptions
{
// ... existing properties ...
///
/// Compaction configuration for this run.
/// Set to override agent default, or null to disable compaction.
///
public AutoCompactionConfig? Compaction { get; init; }
///
/// Whether to use agent's default compaction config.
/// When false, Compaction property is used (even if null).
///
public bool UseDefaultCompaction { get; init; } = true;
}5. ChatClientAgent with Run-Level Compaction
// In ChatClientAgent.cs
public class ChatClientAgent : AIAgent
{
private readonly ChatClientAgentOptions _options;
private async Task RunCoreAsync(
IEnumerable? messages,
ChatClientAgentThread thread,
ChatClientAgentRunOptions? runOptions,
CancellationToken cancellationToken)
{
// Resolve effective compaction config: run-level overrides agent-level
var effectiveCompaction = ResolveCompactionConfig(runOptions);
// Auto-compaction before run
if (effectiveCompaction is { Trigger: CompactionTrigger.BeforeRun })
{
if (thread.TokenCount > effectiveCompaction.Threshold)
{
await thread.CompactAsync(
effectiveCompaction.Compactor,
effectiveCompaction.Options,
cancellationToken);
}
}
// Prepare messages (existing logic)
var (inputMessagesForChatClient, chatOptions) = await PrepareThreadAndMessagesAsync(
thread, messages, runOptions, cancellationToken);
// Call chat client
var chatResponse = await chatClient.GetResponseAsync(
inputMessagesForChatClient,
chatOptions,
cancellationToken);
// Track usage from response
if (chatResponse.Usage != null)
{
thread.OnUsage(chatResponse.Usage);
}
// Update message store (existing logic)
await NotifyMessageStoreOfNewMessagesAsync(thread, messages, chatResponse.Messages, cancellationToken);
// Auto-compaction after run
if (effectiveCompaction is { Trigger: CompactionTrigger.AfterRun })
{
if (thread.TokenCount > effectiveCompaction.Threshold)
{
await thread.CompactAsync(
effectiveCompaction.Compactor,
effectiveCompaction.Options,
cancellationToken);
}
}
return new AgentRunResponse(chatResponse.Messages, thread, chatResponse.Usage);
}
private AutoCompactionConfig? ResolveCompactionConfig(ChatClientAgentRunOptions? runOptions)
{
// If run options explicitly set compaction (even to null), use that
if (runOptions != null && !runOptions.UseDefaultCompaction)
{
return runOptions.Compaction;
}
// If run options provide compaction, use it
if (runOptions?.Compaction != null)
{
return runOptions.Compaction;
}
// Fall back to agent default
return _options.Compaction;
}
}6. Built-in Compactors
// TruncationCompactor.cs
public class TruncationCompactor : ICompactor
{
private readonly TruncationOptions _defaultOptions;
public TruncationCompactor(TruncationOptions? options = null)
{
_defaultOptions = options ?? new TruncationOptions();
}
public Task<IReadOnlyList> CompactAsync(
IReadOnlyList messages,
CompactionOptions options,
CancellationToken cancellationToken = default)
{
var opts = MergeOptions(options);
// ... implementation ...
}
}
public class TruncationOptions : CompactionOptions
{
public int PreserveRecent { get; init; } = 2;
public bool PreserveSystem { get; init; } = true;
public TruncationStrategy Strategy { get; init; } = TruncationStrategy.Moderate;
}
public enum TruncationStrategy
{
Aggressive, // Keep 25% of candidates
Moderate, // Keep 50% of candidates
Conservative // Keep 75% of candidates
}
// SummarizationCompactor.cs
public class SummarizationCompactor : ICompactor
{
private readonly IChatClient _chatClient;
private readonly SummarizationOptions _defaultOptions;
public SummarizationCompactor(
IChatClient chatClient,
SummarizationOptions? options = null)
{
_chatClient = chatClient;
_defaultOptions = options ?? new SummarizationOptions();
}
// ... implementation ...
}
public class SummarizationOptions : CompactionOptions
{
public int PreserveRecent { get; init; } = 3;
public string SummaryPrompt { get; init; } = "Summarize concisely...";
public string? SummaryModelId { get; init; }
}Usage Examples
Python: Explicit Opt-In Per-Run
from agent_framework import ChatAgent
from agent_framework.compaction import (
AutoCompactionConfig,
TruncationCompactor,
TruncationOptions,
SummarizationCompactor,
)
client = OpenAIChatClient()
agent = ChatAgent(chat_client=client, name="Assistant")
thread = agent.get_new_thread()
# Default: No compaction
async for event in agent.run_stream("Hello!", thread=thread):
print(event.content, end="") # No compaction happens
# Explicitly enable compaction for this run
async for event in agent.run_stream(
"Let's have a long conversation...",
thread=thread,
compaction=AutoCompactionConfig(
compactor=TruncationCompactor(),
threshold=100_000,
)
):
print(event.content, end="")
# Different compaction strategy for different runs
async for event in agent.run_stream(
"Summarize our conversation",
thread=thread,
compaction=AutoCompactionConfig(
compactor=SummarizationCompactor(client),
threshold=50_000,
)
):
print(event.content, end="")Python: Agent-Level Defaults (Convenience)
# Set agent-level default for convenience
agent = ChatAgent(
chat_client=client,
name="Assistant",
compaction=AutoCompactionConfig(
compactor=TruncationCompactor(),
threshold=100_000,
)
)
thread = agent.get_new_thread()
# Uses agent default (compaction enabled)
async for event in agent.run_stream("Hello!", thread=thread):
print(event.content, end="")
# Override with different config for this run
async for event in agent.run_stream(
"Let's summarize",
thread=thread,
compaction=AutoCompactionConfig(
compactor=SummarizationCompactor(client),
threshold=80_000,
)
):
print(event.content, end="")
# Disable compaction for this specific run
async for event in agent.run_stream(
"Quick follow-up",
thread=thread,
compaction=None, # Explicitly disable
):
print(event.content, end="")
# Token count is always tracked (even without compaction)
print(f"Current tokens: {thread.token_count}").NET: Explicit Opt-In Per-Run
using Microsoft.Agents.Framework.Compaction;
var agent = new ChatClientAgent(chatClient, new ChatClientAgentOptions
{
Instructions = "You are a helpful assistant."
});
var thread = agent.GetNewThread();
// Default: No compaction
await foreach (var msg in agent.RunStreamingAsync("Hello!", thread))
Console.Write(msg.Content); // No compaction happens
// Explicitly enable compaction for this run
await foreach (var msg in agent.RunStreamingAsync(
"Let's have a long conversation...",
thread,
new ChatClientAgentRunOptions
{
Compaction = new AutoCompactionConfig
{
Compactor = new TruncationCompactor(),
Threshold = 100_000
}
}))
{
Console.Write(msg.Content);
}
// Different config for this run
await foreach (var msg in agent.RunStreamingAsync(
"Summarize please",
thread,
new ChatClientAgentRunOptions
{
Compaction = new AutoCompactionConfig
{
Compactor = new SummarizationCompactor(chatClient),
Threshold = 50_000
}
}))
{
Console.Write(msg.Content);
}.NET: Agent-Level Defaults (Convenience)
// Set agent-level default for convenience
var agent = new ChatClientAgent(chatClient, new ChatClientAgentOptions
{
Instructions = "You are a helpful assistant.",
Compaction = new AutoCompactionConfig
{
Compactor = new TruncationCompactor(),
Threshold = 100_000
}
});
var thread = agent.GetNewThread();
// Uses agent default (compaction enabled)
await foreach (var msg in agent.RunStreamingAsync("Hello!", thread))
Console.Write(msg.Content);
// Override for this run
await foreach (var msg in agent.RunStreamingAsync(
"Summarize",
thread,
new ChatClientAgentRunOptions
{
Compaction = new AutoCompactionConfig
{
Compactor = new SummarizationCompactor(chatClient),
Threshold = 80_000
}
}))
{
Console.Write(msg.Content);
}
// Disable compaction for this run (even though agent has default)
await foreach (var msg in agent.RunStreamingAsync(
"Quick question",
thread,
new ChatClientAgentRunOptions
{
UseDefaultCompaction = false // Explicitly disable
}))
{
Console.Write(msg.Content);
}
Console.WriteLine($"Current tokens: {thread.TokenCount}");Custom Compactor Options
# Truncation with custom settings
compaction = AutoCompactionConfig(
compactor=TruncationCompactor(
TruncationOptions(
preserve_recent=5, # Keep last 5 message pairs
preserve_system=True, # Always keep system messages
strategy="conservative", # Keep 75% of old messages
)
),
threshold=100_000,
trigger="before_run",
)
# Summarization with custom prompt
compaction = AutoCompactionConfig(
compactor=SummarizationCompactor(
client,
SummarizationOptions(
preserve_recent=3,
summary_prompt="Summarize key decisions and current task state.",
summary_model_id="gpt-4o-mini", # Use cheaper model for summaries
)
),
threshold=80_000,
)Manual Compaction (Always Available)
# Without auto-compaction, you can still compact manually
agent = ChatAgent(chat_client=client, name="Assistant") # No compaction config
thread = agent.get_new_thread()
# Run conversations (no auto-compaction)
response = await agent.run("Hello!", thread=thread)
response = await agent.run("Tell me more", thread=thread)
# Token count is still tracked automatically
if thread.usage_details:
print(f"Input tokens: {thread.usage_details.input_token_count}")
print(f"Output tokens: {thread.usage_details.output_token_count}")
# Manually compact when you decide
if thread.token_count > 100_000:
result = await thread.compact(TruncationCompactor())
print(f"Compacted from {result.original_count} to {result.compacted_count} messages")OpenAI /compact Integration
# In agent_framework_openai/compaction.py
from agent_framework.compaction import ManagedCompactor, CompactionProviderProtocol
class OpenAICompactionProvider:
"""OpenAI /responses/compact endpoint integration."""
def __init__(self, client, model_id: str = "gpt-5.1-codex-max"):
self._client = client
self._model_id = model_id
async def compact(
self,
messages: Sequence[ChatMessage],
options: CompactionOptions,
) -> Sequence[ChatMessage]:
response = await self._client.responses.compact(
model=self._model_id,
input=self._to_openai_format(messages),
)
return self._from_response(response)
# Usage
agent = ChatAgent(
chat_client=OpenAIChatClient(),
compaction=AutoCompactionConfig(
compactor=ManagedCompactor(
OpenAICompactionProvider(client, "gpt-5.1-codex-max")
),
threshold=100_000,
)
)Changes Summary
| Component | Change |
|---|---|
ChatAgent.run() / run_stream() |
Add optional compaction parameter (default: None = no compaction) |
ChatAgent.__init__ |
Add optional compaction parameter for agent-level default (default: None) |
AgentThread |
Add token_count, usage_details, on_usage(), compact() |
ChatClientAgentRunOptions (.NET) |
Add Compaction property (default: null = no compaction) |
ChatClientAgentOptions (.NET) |
Add Compaction property for agent-level default (default: null) |
AutoCompactionConfig |
Configuration class for enabling auto-compaction |
CompactorProtocol / ICompactor |
Protocol/interface for compaction strategies |
CompactionOptions |
Base options class |
TruncationCompactor |
Built-in FIFO truncation |
SummarizationCompactor |
Built-in LLM summarization |
ManagedCompactor |
Provider-specific delegation |
Note: All compaction parameters default to None/null. Users must explicitly provide an AutoCompactionConfig to enable compaction.
Configuration Precedence
Run-level compaction config (highest priority)
↓ (if not provided)
Agent-level compaction config
↓ (if not provided)
No compaction (default)
Python:
# Explicit opt-in at run level
agent.run("Hi", compaction=AutoCompactionConfig(...)) # Compaction enabled
# No compaction parameter = no compaction
agent.run("Hi") # No compaction (default)
# If agent has default, it's used
agent = ChatAgent(..., compaction=default_config)
agent.run("Hi") # Uses agent default
# Explicitly disable even if agent has default
agent.run("Hi", compaction=None) # No compactionC#:
// Explicit opt-in at run level
agent.RunAsync("Hi", thread, new() { Compaction = config }); // Compaction enabled
// No run options = no compaction
agent.RunAsync("Hi", thread); // No compaction (default)
// If agent has default, it's used
var agent = new ChatClientAgent(..., new() { Compaction = defaultConfig });
agent.RunAsync("Hi", thread); // Uses agent default
// Explicitly disable even if agent has default
agent.RunAsync("Hi", thread, new() { UseDefaultCompaction = false }); // No compactionOptions Hierarchy
CompactionOptions (base)
└── max_tokens: int = 128_000
TruncationOptions(CompactionOptions)
├── preserve_recent: int = 2
├── preserve_system: bool = True
└── strategy: "aggressive" | "moderate" | "conservative"
SummarizationOptions(CompactionOptions)
├── preserve_recent: int = 3
├── summary_prompt: str
└── summary_model_id: str | None
Token Tracking
The agent tracks actual token usage from provider responses:
# After each agent.run() or agent.run_stream():
# 1. ChatResponse includes usage_details from provider
# 2. Agent calls thread.on_usage(response.usage_details)
# 3. Thread stores input_token_count (= context window size)
# 4. If compaction configured, agent checks threshold and compacts if needed
# Access the tracked count:
thread.token_count # -> int (current context size in tokens)
thread.usage_details # -> UsageDetails with full breakdownNo heuristics — uses actual counts from providers.
Why This Design?
Explicit Opt-In = No Surprises
- Compaction is disabled by default—no unexpected behavior
- Users consciously choose when and how to compact
- Existing code continues to work without changes
Run-Level Configuration = Maximum Flexibility
- Different compaction strategies for different conversations
- Enable compaction only when needed
- Use aggressive compaction when context is filling up
- Use summarization for end-of-session cleanup
Follows Framework Patterns
- Same pattern as
ChatOptions(agent default + run override) - Uses protocols for extensibility (like
ChatClientProtocol) - Options classes for configuration (like
ChatOptions)
Minimal API Surface
- One config class (
AutoCompactionConfig) - One protocol (
CompactorProtocol) - Built-in compactors for common cases
Zero Friction Once Opted In
# Enable compaction - just add one parameter
async for event in agent.run_stream(
"Hello!",
thread=thread,
compaction=AutoCompactionConfig(compactor=TruncationCompactor())
):
...
# That's it - no manual checks needed from here onFull Control Available
- Manual
thread.compact()still available - Per-run overrides
- Custom compactors via protocol
- Access to raw token counts
References
Metadata
Metadata
Assignees
Labels
Type
Projects
Status