-
Notifications
You must be signed in to change notification settings - Fork 386
Description
Alternative concurrency approaches for Lambda Managed Instances
Follows up on #1067.
Background
With the upcoming release of Lambda Managed Instances, we are shipping the concurrency-tokio feature flag and run_concurrent APIs. The initial implementation takes a deliberately conservative approach, optimizing for footgun resistance over raw throughput. It uses bounded concurrency with separate worker tasks, each independently long-polling the Runtime API. This design helps reduce the impact of long-polling or slow handler futures locking up the tokio runtime, which an cause catastrophic tail latency if they delay the tasks polling the lambda runtime API.
This might be a good default for most users, but it leaves performance on the table for well-behaved handlers that don't risk starving the runtime. We intentionally released under a tokio-namespaced feature flag to leave room for alternative concurrency strategies in the future.
This issue tracks discussion of those alternatives. Glad to help scope work for anybody that picks it up, though I
don't have immediate plans to myself.
How the Lambda Runtime API manages concurrency
To my understanding, the Lambda Runtime API data plane itself manages concurrency. The /next long-poll endpoint only returns an invocation when the runtime has capacity to handle it, so the data plane acts as the ultimate backpressure mechanism. A single worker polling /next in a loop will only receive work as fast as the data plane is willing to hand it out.
It's unclear to me whether a single /next connection can be saturated to the point where multiple pollers improve throughput (likely a Lambda team maintainer knows more). Either way, multiple worker loop futures are compatible with the alternative designs below: you can have multiple pollers while still spawning handler invocations as separate stealable tasks.
A Few Options
1. Unbounded tokio worker (no concurrency limits)
Offer an alternative tokio implementation that runs a single worker talking to the Runtime API, spawning tasks directly onto a tokio runtime with no concurrency bounds. This relies on the Lambda runtime data plane to manage concurrency and processes incoming requests at maximum possible speed.
We could also self-manage concurrency via a semaphore instead of sequential worker processing loops.
Tradeoffs:
- For well-behaved handlers, this is likely more efficient (less scheduling overhead, no artificial backpressure).
- The main downside is footgun resistance: it is easier to lock up worker futures when long-polling tasks cause tokio scheduling contention. This can be catastrophic for tail latency.
This could be exposed as a tunable strategy on a builder within the existing concurrency-tokio feature, rather than requiring a new feature flag. Strawman API:
use lambda_runtime::ConcurrentConfig;
// Current default: N independent workers, each with their own /next loop.
// Conservative, footgun-resistant. These two are equivalent:
lambda_runtime::run_concurrent(handler).await?;
let config = ConcurrentConfig::builder()
.strategy(Strategy::WorkerPerSlot)
.build();
lambda_runtime::run_concurrent_with_config(handler, config).await?;
// Option A: single /next poller, unbounded task spawning.
// Relies on the Lambda data plane for backpressure.
let config = ConcurrentConfig::builder()
.strategy(Strategy::Unbounded)
.build();
lambda_runtime::run_concurrent_with_config(handler, config).await?;
// Option B: N workers, but concurrency bounded by a semaphore
// instead of one-request-per-worker.
let config = ConcurrentConfig::builder()
.strategy(Strategy::Semaphore { permits: 64 })
.build();
lambda_runtime::run_concurrent_with_config(handler, config).await?;2. Rayon support for CPU-dominated work
For handlers that are CPU-bound, integrating with rayon's work-stealing thread pool would allow offloading compute-heavy tasks without blocking the async runtime.
This would require either a broader executor abstraction or a dedicated implementation behind a feature flag (e.g. rayon-concurrency). Strawman API:
// feature = "rayon-concurrency"
use lambda_runtime::rayon::run_concurrent;
// Handler runs on a rayon thread pool. The runtime manages
// async I/O (polling /next, posting responses) on a small
// tokio runtime, and dispatches handler invocations to rayon.
run_concurrent(|event: LambdaEvent<Request>| {
// CPU-heavy sync work here, no async required.
// rayon's work-stealing distributes across cores.
Ok(Response { /* ... */ })
}).await?;3. Compio / thread-per-core async
Support compio or a similar thread-per-core async model. This is relevant for workloads where per-core isolation matters and shared-nothing architectures are preferred.
More generally, this could take the form of an async executor abstraction that allows plugging in alternative runtimes beyond tokio. Strawman API:
// feature = "compio-concurrency"
use lambda_runtime::compio::run_concurrent;
// Each core runs its own async runtime with a dedicated /next poller.
// No cross-thread work stealing; shared-nothing by default.
run_concurrent(handler).await?;Alternatively, a more general executor abstraction:
use lambda_runtime::{run_concurrent_with_executor, Executor};
struct MyExecutor;
impl Executor for MyExecutor {
// Spawn, block_on, etc.
}
run_concurrent_with_executor(handler, MyExecutor).await?;Design considerations
- Options 1 could easily fit within the existing
concurrency-tokiofeature as a builder configuration on the concurrency strategy. - Options 2 and 3 are larger efforts that need either a general executor abstraction or separate feature-flagged implementations.
- Footgun resistance vs. performance is the central tension: stricter concurrency bounds protect against tail latency blowups, while looser bounds improve throughput for well-behaved handlers.