Async topic writer wedges under reconnect: 'cannot schedule new futures after shutdown' in _encode_data_inplace (compressing codec)

### Summary

Under connection loss + reconnect (e.g. a node killed during chaos testing), the **async** topic writer using a **compressing codec** (GZIP) can get permanently wedged: `write_with_ack` never returns because the encode `ThreadPoolExecutor` was shut down while the writer is still live and re-encoding, so submitting the encode job raises `RuntimeError('cannot schedule new futures after shutdown')`. The error surfaces only as an *unretrieved background future*, and the awaiting `write_with_ack` hangs indefinitely (no timeout of its own).

### Traceback (observed)

```
grpc/aio/_call.py ... raise asyncio.CancelledError()   # stream cancelled by the node kill
asyncio.exceptions.CancelledError

ERROR  Future exception was never retrieved
future: <Future finished exception=RuntimeError('cannot schedule new futures after shutdown')>
Traceback (most recent call last):
  File ".../ydb/_topic_writer/topic_writer_asyncio.py", line 128, in write_with_ack
    results = [f.result() for f in futures]
  File ".../ydb/_topic_writer/topic_writer_asyncio.py", line 578, in _encode_loop
    await self._encode_data_inplace(batch_codec, messages)
  File ".../ydb/_topic_writer/topic_writer_asyncio.py", line 599, in _encode_data_inplace
    encoded_data_futures = eventloop.run_in_executor(self._encode_executor, encoder_function, ...)
  File ".../concurrent/futures/thread.py", line 167, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
```

### Conditions
- `ydb.aio` topic writer (`topic_client.writer(...)`), compressing codec (`TopicCodec.GZIP`).
- Connection drop + reconnect (chaos: a `CancelledError` propagates from the gRPC stream).
- After that, `_encode_data_inplace` submits to `self._encode_executor`, which has already been shut down → `RuntimeError`.

### Impact
`write_with_ack` never completes → the writer silently hangs. In an application that awaits `write_with_ack` without its own timeout, the whole writer path stalls. Same class of async-topic-writer reconnect-lifecycle issue as the earlier `WriterAsyncIOStream.create()` thread leak (PR #845).

### Root cause (likely)
`self._encode_executor` is shut down while the writer is still alive and re-encoding after a reconnect (executor lifecycle not tied correctly to the writer's live/reconnect state).

### Workaround
Use `TopicCodec.RAW` — `_encode_data_inplace` returns early for RAW and never touches the executor.

### How it was found
YDB Python SLO chaos testing (PR #851): the async topic workload silently stalled at ~half the run; the baseline container's delivery metrics went N/A mid-run. The SLO harness was hardened (timeouts + writer/reader recreate + RAW) to not hang, but the underlying SDK writer bug remains.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Async topic writer wedges under reconnect: 'cannot schedule new futures after shutdown' in _encode_data_inplace (compressing codec) #852

Summary

Traceback (observed)

Conditions

Impact

Root cause (likely)

Workaround

How it was found

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Async topic writer wedges under reconnect: 'cannot schedule new futures after shutdown' in _encode_data_inplace (compressing codec) #852

Description

Summary

Traceback (observed)

Conditions

Impact

Root cause (likely)

Workaround

How it was found

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions