[Feature] Trajectory batcher by theap06 · Pull Request #3584 · pytorch/rl

theap06 · 2026-03-30T01:01:24Z

Description

Adds TrajectoryBatcher to torchrl/collectors/utils.py (and exports it from torchrl.collectors). It wraps any TorchRL collector and yields batches of exactly num_trajectories complete episodes, reconstructed across collector iterations using ("collector", "traj_ids"). Output is a zero-padded [N, max_len, ...] TensorDict with a ("collector", "mask") boolean field marking valid time steps.

from torchrl.collectors import Collector, TrajectoryBatcher

collector = Collector(env_fn, policy, frames_per_batch=200, total_frames=10_000)
batcher = TrajectoryBatcher(collector, num_trajectories=32, strict_on_policy=True)

for batch in batcher:
# batch.shape == [32, max_episode_len]
# batch[("collector", "mask")] marks valid steps
loss = compute_loss(batch)
loss.backward()
optimizer.step()
batcher.update_policy_weights_() # burns partial trajectories when strict_on_policy=True
Describe your changes in detail.
Fixes #3234

Motivation and Context
Many RL algorithms (Monte Carlo / REINFORCE, episodic PPO, imitation learning, RLHF) require full episodes padded into a single batch. Currently users must manually: track traj_ids, rebuild trajectories across collector iterations, detect episode completion via done/mask, pad variable-length episodes, and optionally discard mixed-policy episodes when the policy is updated. This logic is error-prone and reimplemented repeatedly.

There is no existing high-level utility in TorchRL that provides "give me exactly N full episodes as a padded TensorDict", despite the collectors already exposing all the needed information (traj_ids, done, mask).

… any TorchRL collector and yields batches of exactly N complete, padded trajectories. It reconstructs episodes split across collector iterations using traj_ids, supports strict on-policy mode (discard partials on weight update), and outputs standard TensorDict batches with a (collector, mask) field.

pytorch-bot · 2026-03-30T01:01:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3584

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Cancelled Job, 2 Pending

As of commit b673adf with merge base 143b67e ():

NEW FAILURE - The following job has failed:

Continuous Benchmark (PR) / GPU Pytest benchmark (gh)
Workflow failed! Resource not accessible by integration

CANCELLED JOB - The following job was cancelled. Please retry:

Continuous Benchmark (PR) / CPU Pytest benchmark (gh)
Workflow failed! Resource not accessible by integration

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-03-30T01:01:34Z

⚠️ PR Title Label Error

PR title must start with a label prefix in brackets (e.g., [BugFix]).

Current title: Feat/trajectory batcher

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix	Label Applied	Example
`[BugFix]`	BugFix	`[BugFix] Fix memory leak in collector`
`[Feature]`	Feature	`[Feature] Add new optimizer`
`[Doc]` or `[Docs]`	Documentation	`[Doc] Update installation guide`
`[Refactor]`	Refactoring	`[Refactor] Clean up module imports`
`[CI]`	CI	`[CI] Fix workflow permissions`
`[Test]` or `[Tests]`	Tests	`[Tests] Add unit tests for buffer`
`[Environment]` or `[Environments]`	Environments	`[Environments] Add Gymnasium support`
`[Data]`	Data	`[Data] Fix replay buffer sampling`
`[Performance]` or `[Perf]`	Performance	`[Performance] Optimize tensor ops`
`[BC-Breaking]`	bc breaking	`[BC-Breaking] Remove deprecated API`
`[Deprecation]`	Deprecation	`[Deprecation] Mark old function`
`[Quality]`	Quality	`[Quality] Fix typos and add codespell`

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

github-actions · 2026-03-30T01:02:11Z

⚠️ PR Title Label Error

Unknown or invalid prefix [Feature Request].

Current title: [Feature Request] Feat/trajectory batcher

Supported Prefixes (case-sensitive)

Your PR title must start with exactly one of these prefixes:

Prefix	Label Applied	Example
`[BugFix]`	BugFix	`[BugFix] Fix memory leak in collector`
`[Feature]`	Feature	`[Feature] Add new optimizer`
`[Doc]` or `[Docs]`	Documentation	`[Doc] Update installation guide`
`[Refactor]`	Refactoring	`[Refactor] Clean up module imports`
`[CI]`	CI	`[CI] Fix workflow permissions`
`[Test]` or `[Tests]`	Tests	`[Tests] Add unit tests for buffer`
`[Environment]` or `[Environments]`	Environments	`[Environments] Add Gymnasium support`
`[Data]`	Data	`[Data] Fix replay buffer sampling`
`[Performance]` or `[Perf]`	Performance	`[Performance] Optimize tensor ops`
`[BC-Breaking]`	bc breaking	`[BC-Breaking] Remove deprecated API`
`[Deprecation]`	Deprecation	`[Deprecation] Mark old function`
`[Quality]`	Quality	`[Quality] Fix typos and add codespell`

Note: Common variations like singular/plural are supported (e.g., [Doc] or [Docs]).

vmoens

Thanks for this!
Curious to hear your thoughts about the need for a new class instead of a new kwarg in collector build functions?

torchrl/collectors/utils.py

test/test_collectors.py

theap06 · 2026-03-30T09:09:50Z

Thanks for this!
Curious to hear your thoughts about the need for a new class instead of a new kwarg in collector build functions?

TrajectoryBatcher is able to be a wrapper that avoids modifying every collector class, and handles the cross-batch trajectory reassembly + provides strict_on_policy bookkeeping; we can't do that easily by adding a kwarg

vmoens · 2026-03-30T09:42:13Z

TrajectoryBatcher is able to be a wrapper that avoids modifying every collector class, and handles the cross-batch trajectory reassembly + provides strict_on_policy bookkeeping; we can't do that easily by adding a kwarg

Is it as efficient though? Can we make this a post-process rather than a wrapper?
I'm asking this because wrappers have a lot of unforeseen consequences like breaking isinstance checks, obscuring the nature of the object they wrap etc. For example weight sync, state-dict, etc are all kinds of things that we'll need to wire up.
Adding the fact that we have collectors that have that kwarg already so the "two ways to do the same thing" may be puzzling and drive people away from collectors entirely.

theap06 · 2026-03-30T17:59:08Z

TrajectoryBatcher is able to be a wrapper that avoids modifying every collector class, and handles the cross-batch trajectory reassembly + provides strict_on_policy bookkeeping; we can't do that easily by adding a kwarg

Is it as efficient though? Can we make this a post-process rather than a wrapper? I'm asking this because wrappers have a lot of unforeseen consequences like breaking isinstance checks, obscuring the nature of the object they wrap etc. For example weight sync, state-dict, etc are all kinds of things that we'll need to wire up. Adding the fact that we have collectors that have that kwarg already so the "two ways to do the same thing" may be puzzling and drive people away from collectors entirely.

TrajectoryBatcher is gone, and the logic now lives as a num_trajectories_per_batch kwarg on all collectors. implemented in BaseCollector iter using private stateful helpers.

vmoens

I love it!
Some comments on tests and missing items from docstrings.
Thanks for this!

docs/source/reference/collectors_single.rst

test/test_collectors.py

torchrl/collectors/_base.py

test/test_collectors.py

torchrl/collectors/_base.py

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

vmoens

Amazing!
Now I think a follow up PR would be nice and could cover using this with filling up a replay buffer with trajectories (using collector.sart() for async collection).
This would work best with SliceSampler and such traj-oriented tooling in the RB API.

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 30, 2026

github-actions bot added the Collectors label Mar 30, 2026

theap06 changed the title ~~Feat/trajectory batcher~~ [Feature Request] Feat/trajectory batcher Mar 30, 2026

theap06 changed the title ~~[Feature Request] Feat/trajectory batcher~~ [Feature] Feat/trajectory batcher Mar 30, 2026

github-actions bot added the Feature New feature label Mar 30, 2026

vmoens changed the title ~~[Feature] Feat/trajectory batcher~~ [Feature] Trajectory batcher Mar 30, 2026

vmoens reviewed Mar 30, 2026

View reviewed changes

fixed the imports, tests, and the batching

6031484

github-actions bot added the Documentation Improvements or additions to documentation label Mar 30, 2026

theap06 requested a review from vmoens March 30, 2026 09:10

fixed logic

a32069c

vmoens reviewed Mar 31, 2026

View reviewed changes

theap06 and others added 2 commits March 30, 2026 22:53

Update docs/source/reference/collectors_single.rst

be25c77

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

fixed the testing and implemented trajs_per_batch

e6b9d33

theap06 requested a review from vmoens March 31, 2026 06:29

vmoens approved these changes Mar 31, 2026

View reviewed changes

fixed the linting issue

c1b852f

theap06 requested a review from vmoens March 31, 2026 08:18

fixed linting and the issue with testing

b673adf

vmoens merged commit e2c8a8d into pytorch:main Mar 31, 2026
118 of 121 checks passed

Conversation

theap06 commented Mar 30, 2026

Description

Uh oh!

pytorch-bot bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3584

❌ 1 New Failure, 1 Cancelled Job, 2 Pending

Uh oh!

github-actions bot commented Mar 30, 2026

⚠️ PR Title Label Error

Supported Prefixes (case-sensitive)

Uh oh!

github-actions bot commented Mar 30, 2026

⚠️ PR Title Label Error

Supported Prefixes (case-sensitive)

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

theap06 commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vmoens commented Mar 30, 2026

Uh oh!

theap06 commented Mar 30, 2026

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Mar 30, 2026 •

edited

Loading

theap06 commented Mar 30, 2026 •

edited

Loading