Skip to content

[Performance] Add out= parameter to _StepMDP for output buffer reuse#3561

Closed
vmoens wants to merge 2 commits intogh/vmoens/242/basefrom
gh/vmoens/242/head
Closed

[Performance] Add out= parameter to _StepMDP for output buffer reuse#3561
vmoens wants to merge 2 commits intogh/vmoens/242/basefrom
gh/vmoens/242/head

Conversation

@vmoens
Copy link
Copy Markdown
Collaborator

@vmoens vmoens commented Mar 23, 2026

Stack from ghstack (oldest at bottom):

_StepMDP.call now accepts an optional out parameter. When provided,
the output TensorDict is reused instead of allocating a new one each call.
This enables callers (collectors, rollout loops) to pre-allocate a buffer
and avoid per-step TensorDict creation overhead.

Also fixes _exclude return type annotation and ensures it returns the
pre-provided out buffer even when no new keys are set.

Made-with: Cursor

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 23, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3561

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

⏳ No Failures, 14 Pending

As of commit bb2c911 with merge base a4301ee (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 23, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 85.7906μs 84.0190μs 11.9021 KOps/s 11.7690 KOps/s $\color{#35bf28}+1.13\%$
test_tensor_to_bytestream_speed[torch.save] 0.1456ms 0.1439ms 6.9495 KOps/s 6.8857 KOps/s $\color{#35bf28}+0.93\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1112s 0.1108s 9.0237 Ops/s 8.8025 Ops/s $\color{#35bf28}+2.51\%$
test_tensor_to_bytestream_speed[numpy] 2.6645μs 2.6502μs 377.3291 KOps/s 372.1609 KOps/s $\color{#35bf28}+1.39\%$
test_tensor_to_bytestream_speed[safetensors] 39.1302μs 38.0561μs 26.2770 KOps/s 25.5326 KOps/s $\color{#35bf28}+2.92\%$
test_simple 0.6733s 0.5728s 1.7457 Ops/s 1.7410 Ops/s $\color{#35bf28}+0.27\%$
test_transformed 1.1008s 1.0996s 0.9094 Ops/s 0.8941 Ops/s $\color{#35bf28}+1.72\%$
test_serial 1.7108s 1.7087s 0.5852 Ops/s 0.5809 Ops/s $\color{#35bf28}+0.74\%$
test_parallel 1.0271s 1.0238s 0.9768 Ops/s 0.9475 Ops/s $\color{#35bf28}+3.09\%$
test_step_mdp_speed[True-True-True-True-True] 0.2138ms 41.8582μs 23.8902 KOps/s 23.9513 KOps/s $\color{#d91a1a}-0.26\%$
test_step_mdp_speed[True-True-True-True-False] 0.1050ms 23.1687μs 43.1616 KOps/s 43.2266 KOps/s $\color{#d91a1a}-0.15\%$
test_step_mdp_speed[True-True-True-False-True] 94.2820μs 24.0301μs 41.6145 KOps/s 41.6271 KOps/s $\color{#d91a1a}-0.03\%$
test_step_mdp_speed[True-True-True-False-False] 35.4510μs 13.0172μs 76.8214 KOps/s 77.9363 KOps/s $\color{#d91a1a}-1.43\%$
test_step_mdp_speed[True-True-False-True-True] 73.3320μs 45.2559μs 22.0966 KOps/s 22.7804 KOps/s $\color{#d91a1a}-3.00\%$
test_step_mdp_speed[True-True-False-True-False] 56.5610μs 25.8053μs 38.7518 KOps/s 38.8720 KOps/s $\color{#d91a1a}-0.31\%$
test_step_mdp_speed[True-True-False-False-True] 52.8710μs 26.8617μs 37.2277 KOps/s 37.4986 KOps/s $\color{#d91a1a}-0.72\%$
test_step_mdp_speed[True-True-False-False-False] 39.8010μs 15.7633μs 63.4386 KOps/s 64.6036 KOps/s $\color{#d91a1a}-1.80\%$
test_step_mdp_speed[True-False-True-True-True] 79.3820μs 47.9195μs 20.8683 KOps/s 21.2188 KOps/s $\color{#d91a1a}-1.65\%$
test_step_mdp_speed[True-False-True-True-False] 66.1210μs 28.6203μs 34.9402 KOps/s 35.4003 KOps/s $\color{#d91a1a}-1.30\%$
test_step_mdp_speed[True-False-True-False-True] 57.5920μs 26.9085μs 37.1630 KOps/s 38.1802 KOps/s $\color{#d91a1a}-2.66\%$
test_step_mdp_speed[True-False-True-False-False] 45.3310μs 15.4691μs 64.6450 KOps/s 64.5625 KOps/s $\color{#35bf28}+0.13\%$
test_step_mdp_speed[True-False-False-True-True] 90.6120μs 49.3912μs 20.2465 KOps/s 20.5040 KOps/s $\color{#d91a1a}-1.26\%$
test_step_mdp_speed[True-False-False-True-False] 98.8520μs 30.6574μs 32.6186 KOps/s 32.9387 KOps/s $\color{#d91a1a}-0.97\%$
test_step_mdp_speed[True-False-False-False-True] 58.7010μs 29.3276μs 34.0976 KOps/s 34.2655 KOps/s $\color{#d91a1a}-0.49\%$
test_step_mdp_speed[True-False-False-False-False] 47.4410μs 18.2212μs 54.8812 KOps/s 56.2947 KOps/s $\color{#d91a1a}-2.51\%$
test_step_mdp_speed[False-True-True-True-True] 84.0110μs 47.6325μs 20.9941 KOps/s 21.1355 KOps/s $\color{#d91a1a}-0.67\%$
test_step_mdp_speed[False-True-True-True-False] 57.8010μs 28.1871μs 35.4772 KOps/s 35.5675 KOps/s $\color{#d91a1a}-0.25\%$
test_step_mdp_speed[False-True-True-False-True] 2.3418ms 30.7873μs 32.4809 KOps/s 33.3710 KOps/s $\color{#d91a1a}-2.67\%$
test_step_mdp_speed[False-True-True-False-False] 53.1820μs 17.1514μs 58.3044 KOps/s 58.3944 KOps/s $\color{#d91a1a}-0.15\%$
test_step_mdp_speed[False-True-False-True-True] 84.9720μs 49.5239μs 20.1923 KOps/s 20.2514 KOps/s $\color{#d91a1a}-0.29\%$
test_step_mdp_speed[False-True-False-True-False] 80.5620μs 31.0168μs 32.2406 KOps/s 32.5933 KOps/s $\color{#d91a1a}-1.08\%$
test_step_mdp_speed[False-True-False-False-True] 67.6310μs 32.5521μs 30.7199 KOps/s 30.8554 KOps/s $\color{#d91a1a}-0.44\%$
test_step_mdp_speed[False-True-False-False-False] 49.4910μs 19.7472μs 50.6402 KOps/s 51.6852 KOps/s $\color{#d91a1a}-2.02\%$
test_step_mdp_speed[False-False-True-True-True] 86.8820μs 53.8085μs 18.5844 KOps/s 19.4556 KOps/s $\color{#d91a1a}-4.48\%$
test_step_mdp_speed[False-False-True-True-False] 0.1056ms 33.2920μs 30.0373 KOps/s 30.3288 KOps/s $\color{#d91a1a}-0.96\%$
test_step_mdp_speed[False-False-True-False-True] 65.0010μs 32.2537μs 31.0042 KOps/s 31.4735 KOps/s $\color{#d91a1a}-1.49\%$
test_step_mdp_speed[False-False-True-False-False] 48.2410μs 19.6279μs 50.9480 KOps/s 51.5356 KOps/s $\color{#d91a1a}-1.14\%$
test_step_mdp_speed[False-False-False-True-True] 87.0510μs 55.6706μs 17.9628 KOps/s 18.4551 KOps/s $\color{#d91a1a}-2.67\%$
test_step_mdp_speed[False-False-False-True-False] 66.0310μs 35.9124μs 27.8455 KOps/s 27.8330 KOps/s $\color{#35bf28}+0.05\%$
test_step_mdp_speed[False-False-False-False-True] 70.4710μs 34.2508μs 29.1964 KOps/s 28.3574 KOps/s $\color{#35bf28}+2.96\%$
test_step_mdp_speed[False-False-False-False-False] 52.1210μs 22.4166μs 44.6097 KOps/s 44.7355 KOps/s $\color{#d91a1a}-0.28\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7339s 0.7300s 1.3698 Ops/s 1.3250 Ops/s $\color{#35bf28}+3.38\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7250s 0.6180s 1.6180 Ops/s 1.6201 Ops/s $\color{#d91a1a}-0.13\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7543s 1.6623s 0.6016 Ops/s 0.6011 Ops/s $\color{#35bf28}+0.08\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5264s 1.4414s 0.6938 Ops/s 0.6921 Ops/s $\color{#35bf28}+0.25\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 2.0078s 1.9217s 0.5204 Ops/s 0.5216 Ops/s $\color{#d91a1a}-0.23\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7797s 1.6940s 0.5903 Ops/s 0.5922 Ops/s $\color{#d91a1a}-0.32\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7256s 4.6674s 0.2143 Ops/s 0.2151 Ops/s $\color{#d91a1a}-0.42\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5471s 4.4136s 0.2266 Ops/s 0.2253 Ops/s $\color{#35bf28}+0.56\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.0162s 1.9117s 0.5231 Ops/s 0.5303 Ops/s $\color{#d91a1a}-1.36\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6821s 1.5892s 0.6292 Ops/s 0.5942 Ops/s $\textbf{\color{#35bf28}+5.89\%}$
test_values[generalized_advantage_estimate-True-True] 10.5604ms 10.3470ms 96.6466 Ops/s 99.5685 Ops/s $\color{#d91a1a}-2.93\%$
test_values[vec_generalized_advantage_estimate-True-True] 19.8090ms 17.6399ms 56.6897 Ops/s 56.6992 Ops/s $\color{#d91a1a}-0.02\%$
test_values[td0_return_estimate-False-False] 0.2351ms 0.1300ms 7.6915 KOps/s 7.6104 KOps/s $\color{#35bf28}+1.07\%$
test_values[td1_return_estimate-False-False] 28.6411ms 28.2552ms 35.3917 Ops/s 35.6798 Ops/s $\color{#d91a1a}-0.81\%$
test_values[vec_td1_return_estimate-False-False] 20.6756ms 17.9120ms 55.8285 Ops/s 56.0166 Ops/s $\color{#d91a1a}-0.34\%$
test_values[td_lambda_return_estimate-True-False] 42.4328ms 41.7016ms 23.9799 Ops/s 24.5479 Ops/s $\color{#d91a1a}-2.31\%$
test_values[vec_td_lambda_return_estimate-True-False] 18.1035ms 17.7011ms 56.4936 Ops/s 56.5468 Ops/s $\color{#d91a1a}-0.09\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.2857ms 9.2057ms 108.6287 Ops/s 112.0663 Ops/s $\color{#d91a1a}-3.07\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7812ms 1.5576ms 642.0233 Ops/s 641.5771 Ops/s $\color{#35bf28}+0.07\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4974ms 0.4319ms 2.3156 KOps/s 2.3150 KOps/s $\color{#35bf28}+0.02\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 35.7709ms 34.8305ms 28.7104 Ops/s 28.3177 Ops/s $\color{#35bf28}+1.39\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1.8518ms 1.7252ms 579.6440 Ops/s 580.0244 Ops/s $\color{#d91a1a}-0.07\%$
test_dqn_speed[False-None] 1.5059ms 1.4111ms 708.6610 Ops/s 689.8803 Ops/s $\color{#35bf28}+2.72\%$
test_dqn_speed[False-backward] 2.0390ms 1.9517ms 512.3798 Ops/s 505.2917 Ops/s $\color{#35bf28}+1.40\%$
test_dqn_speed[True-None] 0.8071ms 0.5738ms 1.7427 KOps/s 1.7222 KOps/s $\color{#35bf28}+1.19\%$
test_dqn_speed[True-backward] 1.1136ms 1.0564ms 946.6465 Ops/s 840.8758 Ops/s $\textbf{\color{#35bf28}+12.58\%}$
test_dqn_speed[reduce-overhead-None] 0.7334ms 0.5611ms 1.7821 KOps/s 1.7493 KOps/s $\color{#35bf28}+1.87\%$
test_ddpg_speed[False-None] 3.2285ms 2.8900ms 346.0239 Ops/s 350.1665 Ops/s $\color{#d91a1a}-1.18\%$
test_ddpg_speed[False-backward] 4.2350ms 4.0975ms 244.0538 Ops/s 243.3542 Ops/s $\color{#35bf28}+0.29\%$
test_ddpg_speed[True-None] 1.6233ms 1.4746ms 678.1558 Ops/s 666.2942 Ops/s $\color{#35bf28}+1.78\%$
test_ddpg_speed[True-backward] 2.5638ms 2.5046ms 399.2625 Ops/s 393.7675 Ops/s $\color{#35bf28}+1.40\%$
test_ddpg_speed[reduce-overhead-None] 1.7383ms 1.4677ms 681.3341 Ops/s 689.1228 Ops/s $\color{#d91a1a}-1.13\%$
test_sac_speed[False-None] 8.6895ms 8.1651ms 122.4729 Ops/s 121.6897 Ops/s $\color{#35bf28}+0.64\%$
test_sac_speed[False-backward] 12.0005ms 11.4659ms 87.2154 Ops/s 85.5290 Ops/s $\color{#35bf28}+1.97\%$
test_sac_speed[True-None] 2.3931ms 2.2580ms 442.8754 Ops/s 451.6738 Ops/s $\color{#d91a1a}-1.95\%$
test_sac_speed[True-backward] 4.3135ms 4.2130ms 237.3602 Ops/s 244.5113 Ops/s $\color{#d91a1a}-2.92\%$
test_sac_speed[reduce-overhead-None] 2.4355ms 2.2435ms 445.7400 Ops/s 454.1539 Ops/s $\color{#d91a1a}-1.85\%$
test_redq_speed[False-None] 15.8068ms 10.8729ms 91.9715 Ops/s 91.6400 Ops/s $\color{#35bf28}+0.36\%$
test_redq_speed[False-backward] 22.3640ms 18.3574ms 54.4739 Ops/s 54.4496 Ops/s $\color{#35bf28}+0.04\%$
test_redq_speed[True-None] 4.9442ms 4.6792ms 213.7127 Ops/s 210.1151 Ops/s $\color{#35bf28}+1.71\%$
test_redq_speed[reduce-overhead-None] 5.0361ms 4.6026ms 217.2662 Ops/s 207.3352 Ops/s $\color{#35bf28}+4.79\%$
test_redq_deprec_speed[False-None] 11.9427ms 11.3576ms 88.0464 Ops/s 88.6460 Ops/s $\color{#d91a1a}-0.68\%$
test_redq_deprec_speed[False-backward] 17.0402ms 16.4416ms 60.8214 Ops/s 61.3382 Ops/s $\color{#d91a1a}-0.84\%$
test_redq_deprec_speed[True-None] 3.9795ms 3.7516ms 266.5523 Ops/s 270.6746 Ops/s $\color{#d91a1a}-1.52\%$
test_redq_deprec_speed[True-backward] 8.1867ms 7.7449ms 129.1171 Ops/s 130.9711 Ops/s $\color{#d91a1a}-1.42\%$
test_redq_deprec_speed[reduce-overhead-None] 4.0777ms 3.7150ms 269.1776 Ops/s 271.9465 Ops/s $\color{#d91a1a}-1.02\%$
test_td3_speed[False-None] 9.1648ms 8.2837ms 120.7194 Ops/s 122.4444 Ops/s $\color{#d91a1a}-1.41\%$
test_td3_speed[False-backward] 11.4222ms 11.1284ms 89.8604 Ops/s 90.7265 Ops/s $\color{#d91a1a}-0.95\%$
test_td3_speed[True-None] 1.9607ms 1.8915ms 528.6686 Ops/s 533.7130 Ops/s $\color{#d91a1a}-0.95\%$
test_td3_speed[True-backward] 3.8928ms 3.6689ms 272.5633 Ops/s 273.7127 Ops/s $\color{#d91a1a}-0.42\%$
test_td3_speed[reduce-overhead-None] 1.9240ms 1.8481ms 541.0923 Ops/s 550.2645 Ops/s $\color{#d91a1a}-1.67\%$
test_cql_speed[False-None] 30.6340ms 26.8677ms 37.2195 Ops/s 37.7917 Ops/s $\color{#d91a1a}-1.51\%$
test_cql_speed[False-backward] 39.9192ms 36.1357ms 27.6734 Ops/s 27.5047 Ops/s $\color{#35bf28}+0.61\%$
test_cql_speed[True-None] 15.5284ms 12.9687ms 77.1085 Ops/s 78.3082 Ops/s $\color{#d91a1a}-1.53\%$
test_cql_speed[True-backward] 18.8255ms 18.2790ms 54.7076 Ops/s 55.9372 Ops/s $\color{#d91a1a}-2.20\%$
test_cql_speed[reduce-overhead-None] 13.3764ms 12.9476ms 77.2344 Ops/s 80.1344 Ops/s $\color{#d91a1a}-3.62\%$
test_a2c_speed[False-None] 5.9292ms 5.5396ms 180.5197 Ops/s 183.2923 Ops/s $\color{#d91a1a}-1.51\%$
test_a2c_speed[False-backward] 12.5332ms 12.0710ms 82.8433 Ops/s 83.9974 Ops/s $\color{#d91a1a}-1.37\%$
test_a2c_speed[True-None] 4.1635ms 3.8724ms 258.2390 Ops/s 256.0750 Ops/s $\color{#35bf28}+0.85\%$
test_a2c_speed[True-backward] 9.1611ms 8.9084ms 112.2542 Ops/s 107.2174 Ops/s $\color{#35bf28}+4.70\%$
test_a2c_speed[reduce-overhead-None] 4.1805ms 3.8383ms 260.5322 Ops/s 259.7826 Ops/s $\color{#35bf28}+0.29\%$
test_ppo_speed[False-None] 6.5488ms 5.9663ms 167.6082 Ops/s 166.5271 Ops/s $\color{#35bf28}+0.65\%$
test_ppo_speed[False-backward] 12.8723ms 12.5977ms 79.3798 Ops/s 78.8960 Ops/s $\color{#35bf28}+0.61\%$
test_ppo_speed[True-None] 4.0517ms 3.8487ms 259.8250 Ops/s 263.3360 Ops/s $\color{#d91a1a}-1.33\%$
test_ppo_speed[True-backward] 9.1091ms 8.8019ms 113.6121 Ops/s 112.5167 Ops/s $\color{#35bf28}+0.97\%$
test_ppo_speed[reduce-overhead-None] 4.1814ms 3.8205ms 261.7435 Ops/s 263.5204 Ops/s $\color{#d91a1a}-0.67\%$
test_reinforce_speed[False-None] 4.7722ms 4.5906ms 217.8376 Ops/s 219.1425 Ops/s $\color{#d91a1a}-0.60\%$
test_reinforce_speed[False-backward] 8.0172ms 7.5052ms 133.2410 Ops/s 133.8750 Ops/s $\color{#d91a1a}-0.47\%$
test_reinforce_speed[True-None] 3.5954ms 3.0753ms 325.1678 Ops/s 332.1843 Ops/s $\color{#d91a1a}-2.11\%$
test_reinforce_speed[True-backward] 8.5095ms 8.0786ms 123.7831 Ops/s 122.4166 Ops/s $\color{#35bf28}+1.12\%$
test_reinforce_speed[reduce-overhead-None] 3.2707ms 3.0249ms 330.5892 Ops/s 325.9361 Ops/s $\color{#35bf28}+1.43\%$
test_iql_speed[False-None] 21.2430ms 20.2535ms 49.3743 Ops/s 47.9400 Ops/s $\color{#35bf28}+2.99\%$
test_iql_speed[False-backward] 32.7698ms 30.9613ms 32.2983 Ops/s 32.3227 Ops/s $\color{#d91a1a}-0.08\%$
test_iql_speed[True-None] 9.3127ms 8.7911ms 113.7511 Ops/s 114.8077 Ops/s $\color{#d91a1a}-0.92\%$
test_iql_speed[True-backward] 17.9517ms 17.1864ms 58.1856 Ops/s 59.2773 Ops/s $\color{#d91a1a}-1.84\%$
test_iql_speed[reduce-overhead-None] 9.2170ms 8.7074ms 114.8454 Ops/s 115.6028 Ops/s $\color{#d91a1a}-0.66\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.2448ms 6.0963ms 164.0345 Ops/s 162.8184 Ops/s $\color{#35bf28}+0.75\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.9911ms 0.3019ms 3.3121 KOps/s 2.9692 KOps/s $\textbf{\color{#35bf28}+11.55\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5337ms 0.2747ms 3.6402 KOps/s 3.1033 KOps/s $\textbf{\color{#35bf28}+17.30\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1563ms 5.8557ms 170.7738 Ops/s 169.1058 Ops/s $\color{#35bf28}+0.99\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.7058ms 0.2917ms 3.4281 KOps/s 3.0454 KOps/s $\textbf{\color{#35bf28}+12.57\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4918ms 0.2779ms 3.5990 KOps/s 3.1749 KOps/s $\textbf{\color{#35bf28}+13.36\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5111ms 1.2910ms 774.6128 Ops/s 744.1092 Ops/s $\color{#35bf28}+4.10\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.4105ms 1.2128ms 824.5263 Ops/s 792.8441 Ops/s $\color{#35bf28}+4.00\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 10.1477ms 6.1226ms 163.3290 Ops/s 165.3298 Ops/s $\color{#d91a1a}-1.21\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9029ms 0.4932ms 2.0276 KOps/s 2.2152 KOps/s $\textbf{\color{#d91a1a}-8.47\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8087ms 0.4834ms 2.0685 KOps/s 2.3037 KOps/s $\textbf{\color{#d91a1a}-10.21\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.9302ms 5.8454ms 171.0758 Ops/s 170.4633 Ops/s $\color{#35bf28}+0.36\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.6260ms 0.2948ms 3.3918 KOps/s 2.8039 KOps/s $\textbf{\color{#35bf28}+20.97\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4652ms 0.2726ms 3.6687 KOps/s 2.9738 KOps/s $\textbf{\color{#35bf28}+23.37\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1477ms 5.8101ms 172.1143 Ops/s 169.5301 Ops/s $\color{#35bf28}+1.52\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.1162ms 0.2900ms 3.4477 KOps/s 2.7117 KOps/s $\textbf{\color{#35bf28}+27.15\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.4785ms 0.2696ms 3.7089 KOps/s 2.9817 KOps/s $\textbf{\color{#35bf28}+24.39\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.1764ms 6.0170ms 166.1958 Ops/s 166.0060 Ops/s $\color{#35bf28}+0.11\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9248ms 0.4911ms 2.0364 KOps/s 1.9455 KOps/s $\color{#35bf28}+4.67\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7363ms 0.4613ms 2.1677 KOps/s 2.1488 KOps/s $\color{#35bf28}+0.88\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.4637ms 5.0767ms 196.9765 Ops/s 49.2032 Ops/s $\textbf{\color{#35bf28}+300.33\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 4.0429ms 1.9963ms 500.9223 Ops/s 480.3819 Ops/s $\color{#35bf28}+4.28\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.0968ms 0.9092ms 1.0998 KOps/s 897.3809 Ops/s $\textbf{\color{#35bf28}+22.56\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.6334s 17.7982ms 56.1853 Ops/s 194.4992 Ops/s $\textbf{\color{#d91a1a}-71.11\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.8095ms 1.8542ms 539.3207 Ops/s 563.9204 Ops/s $\color{#d91a1a}-4.36\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 2.0093ms 1.0986ms 910.2235 Ops/s 775.0537 Ops/s $\textbf{\color{#35bf28}+17.44\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 8.5067ms 5.3420ms 187.1944 Ops/s 186.0174 Ops/s $\color{#35bf28}+0.63\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 8.7413ms 2.0583ms 485.8386 Ops/s 532.8509 Ops/s $\textbf{\color{#d91a1a}-8.82\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.1607ms 1.2436ms 804.1253 Ops/s 897.8005 Ops/s $\textbf{\color{#d91a1a}-10.43\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 45.1670ms 40.2258ms 24.8596 Ops/s 24.6188 Ops/s $\color{#35bf28}+0.98\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.8061ms 18.5335ms 53.9564 Ops/s 53.1865 Ops/s $\color{#35bf28}+1.45\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 46.3212ms 41.4054ms 24.1515 Ops/s 23.9193 Ops/s $\color{#35bf28}+0.97\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.3752ms 18.9234ms 52.8447 Ops/s 53.0760 Ops/s $\color{#d91a1a}-0.44\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 45.8199ms 43.7863ms 22.8382 Ops/s 22.8524 Ops/s $\color{#d91a1a}-0.06\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.9346ms 20.8060ms 48.0631 Ops/s 48.9276 Ops/s $\color{#d91a1a}-1.77\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8864ms 0.2342ms 4.2702 KOps/s 4.4341 KOps/s $\color{#d91a1a}-3.70\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.8708ms 1.5127ms 661.0860 Ops/s 702.5375 Ops/s $\textbf{\color{#d91a1a}-5.90\%}$
test_storage_write_lazystack[100-img_shape2-large_img] 2.6190ms 2.4136ms 414.3249 Ops/s 413.7479 Ops/s $\color{#35bf28}+0.14\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.4093ms 3.1296ms 319.5310 Ops/s 337.1785 Ops/s $\textbf{\color{#d91a1a}-5.23\%}$
test_storage_write_contiguous[50-img_shape0-small] 0.2099ms 0.1394ms 7.1760 KOps/s 7.1225 KOps/s $\color{#35bf28}+0.75\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3214ms 0.1935ms 5.1692 KOps/s 5.4141 KOps/s $\color{#d91a1a}-4.52\%$
test_storage_write_contiguous[100-img_shape2-large_img] 2.1604ms 1.8841ms 530.7639 Ops/s 571.4324 Ops/s $\textbf{\color{#d91a1a}-7.12\%}$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.6954ms 1.4087ms 709.8889 Ops/s 767.6196 Ops/s $\textbf{\color{#d91a1a}-7.52\%}$
test_collector_stack_then_write[50-img_shape0-small] 1.2670ms 1.1441ms 874.0681 Ops/s 879.1337 Ops/s $\color{#d91a1a}-0.58\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.9096ms 3.6872ms 271.2087 Ops/s 276.7493 Ops/s $\color{#d91a1a}-2.00\%$
test_collector_stack_then_write[100-img_shape2-large_img] 10.2102ms 5.8679ms 170.4194 Ops/s 178.3045 Ops/s $\color{#d91a1a}-4.42\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 15.5368ms 7.3857ms 135.3959 Ops/s 143.5517 Ops/s $\textbf{\color{#d91a1a}-5.68\%}$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4688ms 0.2920ms 3.4243 KOps/s 3.5774 KOps/s $\color{#d91a1a}-4.28\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.8689ms 1.6194ms 617.5014 Ops/s 641.6081 Ops/s $\color{#d91a1a}-3.76\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.8441ms 2.5588ms 390.8061 Ops/s 392.7740 Ops/s $\color{#d91a1a}-0.50\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.5837ms 3.2931ms 303.6646 Ops/s 315.0862 Ops/s $\color{#d91a1a}-3.62\%$
test_collector_without_rb[100-img_shape0-atari] 34.2180ms 33.6906ms 29.6819 Ops/s 29.8793 Ops/s $\color{#d91a1a}-0.66\%$
test_collector_without_rb[200-img_shape1-large_batch] 66.9777ms 66.2043ms 15.1048 Ops/s 15.2401 Ops/s $\color{#d91a1a}-0.89\%$
test_collector_with_rb[100-img_shape0-atari] 38.9950ms 38.3620ms 26.0675 Ops/s 26.2389 Ops/s $\color{#d91a1a}-0.65\%$
test_collector_with_rb[200-img_shape1-large_batch] 76.8274ms 75.0421ms 13.3259 Ops/s 13.3586 Ops/s $\color{#d91a1a}-0.25\%$

@github-actions
Copy link
Copy Markdown
Contributor

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 82.2061μs 81.3948μs 12.2858 KOps/s 12.2276 KOps/s $\color{#35bf28}+0.48\%$
test_tensor_to_bytestream_speed[torch.save] 0.1506ms 0.1487ms 6.7248 KOps/s 7.0173 KOps/s $\color{#d91a1a}-4.17\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1188s 0.1185s 8.4361 Ops/s 8.2795 Ops/s $\color{#35bf28}+1.89\%$
test_tensor_to_bytestream_speed[numpy] 2.5125μs 2.5084μs 398.6571 KOps/s 395.6212 KOps/s $\color{#35bf28}+0.77\%$
test_tensor_to_bytestream_speed[safetensors] 37.1004μs 36.8819μs 27.1135 KOps/s 27.0615 KOps/s $\color{#35bf28}+0.19\%$
test_simple 0.8132s 0.8002s 1.2497 Ops/s 1.2090 Ops/s $\color{#35bf28}+3.37\%$
test_transformed 1.3998s 1.3987s 0.7149 Ops/s 0.7080 Ops/s $\color{#35bf28}+0.99\%$
test_serial 2.4574s 2.3700s 0.4219 Ops/s 0.4192 Ops/s $\color{#35bf28}+0.66\%$
test_parallel 1.9333s 1.8792s 0.5321 Ops/s 0.5487 Ops/s $\color{#d91a1a}-3.01\%$
test_step_mdp_speed[True-True-True-True-True] 0.1688ms 42.0165μs 23.8002 KOps/s 23.6964 KOps/s $\color{#35bf28}+0.44\%$
test_step_mdp_speed[True-True-True-True-False] 57.8740μs 22.9129μs 43.6435 KOps/s 43.0469 KOps/s $\color{#35bf28}+1.39\%$
test_step_mdp_speed[True-True-True-False-True] 59.8130μs 23.1203μs 43.2520 KOps/s 42.8717 KOps/s $\color{#35bf28}+0.89\%$
test_step_mdp_speed[True-True-True-False-False] 42.1330μs 12.7176μs 78.6310 KOps/s 78.0322 KOps/s $\color{#35bf28}+0.77\%$
test_step_mdp_speed[True-True-False-True-True] 78.2850μs 44.2376μs 22.6052 KOps/s 22.1760 KOps/s $\color{#35bf28}+1.94\%$
test_step_mdp_speed[True-True-False-True-False] 58.6640μs 25.5491μs 39.1403 KOps/s 38.8396 KOps/s $\color{#35bf28}+0.77\%$
test_step_mdp_speed[True-True-False-False-True] 54.0530μs 25.7454μs 38.8419 KOps/s 38.2686 KOps/s $\color{#35bf28}+1.50\%$
test_step_mdp_speed[True-True-False-False-False] 52.8230μs 15.3020μs 65.3511 KOps/s 64.3435 KOps/s $\color{#35bf28}+1.57\%$
test_step_mdp_speed[True-False-True-True-True] 79.7550μs 46.0832μs 21.6999 KOps/s 21.5996 KOps/s $\color{#35bf28}+0.46\%$
test_step_mdp_speed[True-False-True-True-False] 58.7840μs 28.2841μs 35.3555 KOps/s 35.6327 KOps/s $\color{#d91a1a}-0.78\%$
test_step_mdp_speed[True-False-True-False-True] 59.2930μs 25.5939μs 39.0717 KOps/s 37.9379 KOps/s $\color{#35bf28}+2.99\%$
test_step_mdp_speed[True-False-True-False-False] 47.0730μs 15.2494μs 65.5762 KOps/s 65.2903 KOps/s $\color{#35bf28}+0.44\%$
test_step_mdp_speed[True-False-False-True-True] 79.6450μs 49.1063μs 20.3640 KOps/s 20.4559 KOps/s $\color{#d91a1a}-0.45\%$
test_step_mdp_speed[True-False-False-True-False] 63.1640μs 30.2491μs 33.0589 KOps/s 32.1612 KOps/s $\color{#35bf28}+2.79\%$
test_step_mdp_speed[True-False-False-False-True] 59.2040μs 28.3910μs 35.2225 KOps/s 34.8156 KOps/s $\color{#35bf28}+1.17\%$
test_step_mdp_speed[True-False-False-False-False] 47.3130μs 17.6171μs 56.7632 KOps/s 55.8781 KOps/s $\color{#35bf28}+1.58\%$
test_step_mdp_speed[False-True-True-True-True] 81.9240μs 46.9618μs 21.2939 KOps/s 21.1186 KOps/s $\color{#35bf28}+0.83\%$
test_step_mdp_speed[False-True-True-True-False] 63.3230μs 27.8784μs 35.8701 KOps/s 35.2236 KOps/s $\color{#35bf28}+1.84\%$
test_step_mdp_speed[False-True-True-False-True] 2.5122ms 29.7716μs 33.5891 KOps/s 33.1040 KOps/s $\color{#35bf28}+1.47\%$
test_step_mdp_speed[False-True-True-False-False] 48.4630μs 17.1018μs 58.4733 KOps/s 58.7743 KOps/s $\color{#d91a1a}-0.51\%$
test_step_mdp_speed[False-True-False-True-True] 79.7940μs 49.0506μs 20.3871 KOps/s 20.2514 KOps/s $\color{#35bf28}+0.67\%$
test_step_mdp_speed[False-True-False-True-False] 64.1740μs 30.6387μs 32.6384 KOps/s 33.1177 KOps/s $\color{#d91a1a}-1.45\%$
test_step_mdp_speed[False-True-False-False-True] 61.6640μs 31.7842μs 31.4622 KOps/s 31.2170 KOps/s $\color{#35bf28}+0.79\%$
test_step_mdp_speed[False-True-False-False-False] 60.3040μs 19.3638μs 51.6427 KOps/s 52.2289 KOps/s $\color{#d91a1a}-1.12\%$
test_step_mdp_speed[False-False-True-True-True] 84.9650μs 51.6157μs 19.3739 KOps/s 19.1052 KOps/s $\color{#35bf28}+1.41\%$
test_step_mdp_speed[False-False-True-True-False] 64.5240μs 33.6370μs 29.7292 KOps/s 30.4645 KOps/s $\color{#d91a1a}-2.41\%$
test_step_mdp_speed[False-False-True-False-True] 71.1540μs 32.0997μs 31.1529 KOps/s 30.9651 KOps/s $\color{#35bf28}+0.61\%$
test_step_mdp_speed[False-False-True-False-False] 49.8930μs 19.3643μs 51.6415 KOps/s 51.8894 KOps/s $\color{#d91a1a}-0.48\%$
test_step_mdp_speed[False-False-False-True-True] 91.2550μs 55.1574μs 18.1299 KOps/s 18.6104 KOps/s $\color{#d91a1a}-2.58\%$
test_step_mdp_speed[False-False-False-True-False] 78.6450μs 36.0440μs 27.7439 KOps/s 28.2113 KOps/s $\color{#d91a1a}-1.66\%$
test_step_mdp_speed[False-False-False-False-True] 0.1047ms 34.2862μs 29.1662 KOps/s 29.0991 KOps/s $\color{#35bf28}+0.23\%$
test_step_mdp_speed[False-False-False-False-False] 50.2530μs 22.0058μs 45.4425 KOps/s 45.6486 KOps/s $\color{#d91a1a}-0.45\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8527s 0.7514s 1.3309 Ops/s 1.3405 Ops/s $\color{#d91a1a}-0.72\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7103s 0.6084s 1.6438 Ops/s 1.6404 Ops/s $\color{#35bf28}+0.20\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7432s 1.6553s 0.6041 Ops/s 0.6055 Ops/s $\color{#d91a1a}-0.23\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5092s 1.4264s 0.7011 Ops/s 0.7006 Ops/s $\color{#35bf28}+0.06\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9813s 1.8968s 0.5272 Ops/s 0.5274 Ops/s $\color{#d91a1a}-0.04\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7579s 1.6787s 0.5957 Ops/s 0.5987 Ops/s $\color{#d91a1a}-0.51\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.7145s 4.5968s 0.2175 Ops/s 0.2159 Ops/s $\color{#35bf28}+0.74\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.4923s 4.4425s 0.2251 Ops/s 0.2251 Ops/s $-0.01\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9674s 1.8967s 0.5272 Ops/s 0.5292 Ops/s $\color{#d91a1a}-0.36\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7285s 1.6273s 0.6145 Ops/s 0.6258 Ops/s $\color{#d91a1a}-1.80\%$
test_values[generalized_advantage_estimate-True-True] 23.1400ms 21.2675ms 47.0200 Ops/s 46.7034 Ops/s $\color{#35bf28}+0.68\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1337s 3.6082ms 277.1493 Ops/s 280.2886 Ops/s $\color{#d91a1a}-1.12\%$
test_values[td0_return_estimate-False-False] 0.1084ms 84.6432μs 11.8143 KOps/s 11.8446 KOps/s $\color{#d91a1a}-0.26\%$
test_values[td1_return_estimate-False-False] 50.9713ms 49.8219ms 20.0715 Ops/s 20.1528 Ops/s $\color{#d91a1a}-0.40\%$
test_values[vec_td1_return_estimate-False-False] 1.3829ms 1.1152ms 896.6628 Ops/s 907.0700 Ops/s $\color{#d91a1a}-1.15\%$
test_values[td_lambda_return_estimate-True-False] 84.1277ms 81.9853ms 12.1973 Ops/s 12.3752 Ops/s $\color{#d91a1a}-1.44\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3377ms 1.1006ms 908.6202 Ops/s 909.8099 Ops/s $\color{#d91a1a}-0.13\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 21.2771ms 21.1250ms 47.3373 Ops/s 47.3050 Ops/s $\color{#35bf28}+0.07\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0496ms 0.7731ms 1.2935 KOps/s 1.2938 KOps/s $\color{#d91a1a}-0.03\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.8390ms 0.6916ms 1.4460 KOps/s 1.4441 KOps/s $\color{#35bf28}+0.13\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5354ms 1.5044ms 664.6974 Ops/s 665.7213 Ops/s $\color{#d91a1a}-0.15\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7852ms 0.7081ms 1.4123 KOps/s 1.4130 KOps/s $\color{#d91a1a}-0.05\%$
test_dqn_speed[False-None] 1.7124ms 1.6258ms 615.0795 Ops/s 621.5379 Ops/s $\color{#d91a1a}-1.04\%$
test_dqn_speed[False-backward] 2.6423ms 2.2752ms 439.5141 Ops/s 439.6583 Ops/s $\color{#d91a1a}-0.03\%$
test_dqn_speed[True-None] 0.7681ms 0.5925ms 1.6879 KOps/s 1.6765 KOps/s $\color{#35bf28}+0.68\%$
test_dqn_speed[True-backward] 1.3881ms 1.2438ms 803.9649 Ops/s 809.9802 Ops/s $\color{#d91a1a}-0.74\%$
test_dqn_speed[reduce-overhead-None] 0.7677ms 0.6365ms 1.5711 KOps/s 1.5894 KOps/s $\color{#d91a1a}-1.15\%$
test_ddpg_speed[False-None] 3.4718ms 3.1772ms 314.7428 Ops/s 326.7278 Ops/s $\color{#d91a1a}-3.67\%$
test_ddpg_speed[False-backward] 5.0646ms 4.6026ms 217.2676 Ops/s 222.2544 Ops/s $\color{#d91a1a}-2.24\%$
test_ddpg_speed[True-None] 1.6105ms 1.4069ms 710.8025 Ops/s 732.1398 Ops/s $\color{#d91a1a}-2.91\%$
test_ddpg_speed[True-backward] 2.6384ms 2.5527ms 391.7399 Ops/s 391.7254 Ops/s $+0.00\%$
test_ddpg_speed[reduce-overhead-None] 1.6330ms 1.4127ms 707.8465 Ops/s 714.1584 Ops/s $\color{#d91a1a}-0.88\%$
test_sac_speed[False-None] 8.9446ms 8.5939ms 116.3614 Ops/s 116.0574 Ops/s $\color{#35bf28}+0.26\%$
test_sac_speed[False-backward] 12.2541ms 11.8827ms 84.1561 Ops/s 84.0896 Ops/s $\color{#35bf28}+0.08\%$
test_sac_speed[True-None] 2.2441ms 1.8948ms 527.7499 Ops/s 534.1525 Ops/s $\color{#d91a1a}-1.20\%$
test_sac_speed[True-backward] 3.8486ms 3.7457ms 266.9703 Ops/s 271.9216 Ops/s $\color{#d91a1a}-1.82\%$
test_sac_speed[reduce-overhead-None] 16.6397ms 10.1162ms 98.8511 Ops/s 98.7586 Ops/s $\color{#35bf28}+0.09\%$
test_redq_deprec_speed[False-None] 10.5189ms 9.6618ms 103.5005 Ops/s 103.6336 Ops/s $\color{#d91a1a}-0.13\%$
test_redq_deprec_speed[False-backward] 13.6767ms 13.1281ms 76.1725 Ops/s 76.4335 Ops/s $\color{#d91a1a}-0.34\%$
test_redq_deprec_speed[True-None] 2.8770ms 2.6321ms 379.9308 Ops/s 381.4294 Ops/s $\color{#d91a1a}-0.39\%$
test_redq_deprec_speed[True-backward] 4.7404ms 4.3007ms 232.5223 Ops/s 229.2037 Ops/s $\color{#35bf28}+1.45\%$
test_redq_deprec_speed[reduce-overhead-None] 14.5449ms 9.6486ms 103.6421 Ops/s 103.0209 Ops/s $\color{#35bf28}+0.60\%$
test_td3_speed[False-None] 8.6628ms 8.4954ms 117.7105 Ops/s 117.8534 Ops/s $\color{#d91a1a}-0.12\%$
test_td3_speed[False-backward] 12.0296ms 11.1770ms 89.4693 Ops/s 89.7715 Ops/s $\color{#d91a1a}-0.34\%$
test_td3_speed[True-None] 1.7568ms 1.7207ms 581.1545 Ops/s 607.5491 Ops/s $\color{#d91a1a}-4.34\%$
test_td3_speed[True-backward] 3.2956ms 3.2006ms 312.4406 Ops/s 310.9313 Ops/s $\color{#35bf28}+0.49\%$
test_td3_speed[reduce-overhead-None] 98.6499ms 25.9290ms 38.5669 Ops/s 38.1199 Ops/s $\color{#35bf28}+1.17\%$
test_cql_speed[False-None] 18.2812ms 17.9987ms 55.5595 Ops/s 55.6628 Ops/s $\color{#d91a1a}-0.19\%$
test_cql_speed[False-backward] 24.4449ms 23.8655ms 41.9015 Ops/s 42.0936 Ops/s $\color{#d91a1a}-0.46\%$
test_cql_speed[True-None] 3.4891ms 3.3650ms 297.1789 Ops/s 298.9559 Ops/s $\color{#d91a1a}-0.59\%$
test_cql_speed[True-backward] 5.8022ms 5.6194ms 177.9535 Ops/s 175.4869 Ops/s $\color{#35bf28}+1.41\%$
test_cql_speed[reduce-overhead-None] 17.9892ms 11.9073ms 83.9819 Ops/s 83.1174 Ops/s $\color{#35bf28}+1.04\%$
test_a2c_speed[False-None] 3.5866ms 3.4054ms 293.6556 Ops/s 291.2374 Ops/s $\color{#35bf28}+0.83\%$
test_a2c_speed[False-backward] 7.3889ms 6.6500ms 150.3764 Ops/s 149.5127 Ops/s $\color{#35bf28}+0.58\%$
test_a2c_speed[True-None] 1.5435ms 1.3825ms 723.3134 Ops/s 697.8725 Ops/s $\color{#35bf28}+3.65\%$
test_a2c_speed[True-backward] 3.2817ms 3.2303ms 309.5673 Ops/s 321.9697 Ops/s $\color{#d91a1a}-3.85\%$
test_a2c_speed[reduce-overhead-None] 1.2047ms 1.0543ms 948.5247 Ops/s 944.4109 Ops/s $\color{#35bf28}+0.44\%$
test_ppo_speed[False-None] 4.1914ms 4.0769ms 245.2830 Ops/s 237.7262 Ops/s $\color{#35bf28}+3.18\%$
test_ppo_speed[False-backward] 7.9712ms 7.5877ms 131.7921 Ops/s 135.3200 Ops/s $\color{#d91a1a}-2.61\%$
test_ppo_speed[True-None] 1.6346ms 1.5312ms 653.0878 Ops/s 653.8251 Ops/s $\color{#d91a1a}-0.11\%$
test_ppo_speed[True-backward] 3.4717ms 3.4261ms 291.8779 Ops/s 310.5376 Ops/s $\textbf{\color{#d91a1a}-6.01\%}$
test_ppo_speed[reduce-overhead-None] 1.1779ms 1.1029ms 906.7205 Ops/s 900.8899 Ops/s $\color{#35bf28}+0.65\%$
test_reinforce_speed[False-None] 3.2088ms 2.4615ms 406.2490 Ops/s 410.2081 Ops/s $\color{#d91a1a}-0.97\%$
test_reinforce_speed[False-backward] 3.7660ms 3.6105ms 276.9692 Ops/s 276.4654 Ops/s $\color{#35bf28}+0.18\%$
test_reinforce_speed[True-None] 1.5068ms 1.3885ms 720.2065 Ops/s 735.6074 Ops/s $\color{#d91a1a}-2.09\%$
test_reinforce_speed[True-backward] 3.3597ms 3.2113ms 311.4044 Ops/s 323.8252 Ops/s $\color{#d91a1a}-3.84\%$
test_reinforce_speed[reduce-overhead-None] 15.8549ms 8.8956ms 112.4156 Ops/s 112.1263 Ops/s $\color{#35bf28}+0.26\%$
test_iql_speed[False-None] 10.4712ms 9.8516ms 101.5060 Ops/s 101.5894 Ops/s $\color{#d91a1a}-0.08\%$
test_iql_speed[False-backward] 14.7354ms 13.9830ms 71.5155 Ops/s 72.9793 Ops/s $\color{#d91a1a}-2.01\%$
test_iql_speed[True-None] 2.4591ms 2.2963ms 435.4738 Ops/s 434.0339 Ops/s $\color{#35bf28}+0.33\%$
test_iql_speed[True-backward] 5.4792ms 5.0681ms 197.3122 Ops/s 204.2989 Ops/s $\color{#d91a1a}-3.42\%$
test_iql_speed[reduce-overhead-None] 16.1719ms 10.0347ms 99.6542 Ops/s 99.2410 Ops/s $\color{#35bf28}+0.42\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.3358ms 5.9394ms 168.3662 Ops/s 167.4721 Ops/s $\color{#35bf28}+0.53\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7269ms 0.3535ms 2.8289 KOps/s 2.7398 KOps/s $\color{#35bf28}+3.25\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5595ms 0.3095ms 3.2313 KOps/s 2.8734 KOps/s $\textbf{\color{#35bf28}+12.45\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.1228ms 5.7385ms 174.2604 Ops/s 172.4803 Ops/s $\color{#35bf28}+1.03\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.9937ms 0.3508ms 2.8508 KOps/s 3.2107 KOps/s $\textbf{\color{#d91a1a}-11.21\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6096ms 0.3015ms 3.3166 KOps/s 3.1084 KOps/s $\textbf{\color{#35bf28}+6.70\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.5588ms 1.3294ms 752.2010 Ops/s 779.6286 Ops/s $\color{#d91a1a}-3.52\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5536ms 1.2408ms 805.9571 Ops/s 831.3161 Ops/s $\color{#d91a1a}-3.05\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 9.2056ms 6.0293ms 165.8563 Ops/s 166.6352 Ops/s $\color{#d91a1a}-0.47\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.2495ms 0.4422ms 2.2613 KOps/s 2.2782 KOps/s $\color{#d91a1a}-0.74\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6489ms 0.4258ms 2.3483 KOps/s 2.3466 KOps/s $\color{#35bf28}+0.07\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.9107ms 5.8232ms 171.7282 Ops/s 171.9553 Ops/s $\color{#d91a1a}-0.13\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.9083ms 0.2903ms 3.4443 KOps/s 2.9381 KOps/s $\textbf{\color{#35bf28}+17.23\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4747ms 0.2762ms 3.6210 KOps/s 2.6644 KOps/s $\textbf{\color{#35bf28}+35.90\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0220ms 5.7361ms 174.3338 Ops/s 173.8428 Ops/s $\color{#35bf28}+0.28\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6716ms 0.3211ms 3.1144 KOps/s 3.3974 KOps/s $\textbf{\color{#d91a1a}-8.33\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5389ms 0.3142ms 3.1827 KOps/s 3.3446 KOps/s $\color{#d91a1a}-4.84\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 8.9060ms 5.9217ms 168.8695 Ops/s 166.0937 Ops/s $\color{#35bf28}+1.67\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.7794ms 0.4790ms 2.0875 KOps/s 2.2445 KOps/s $\textbf{\color{#d91a1a}-6.99\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8650ms 0.4831ms 2.0700 KOps/s 2.3560 KOps/s $\textbf{\color{#d91a1a}-12.14\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.9478s 23.9421ms 41.7674 Ops/s 194.8081 Ops/s $\textbf{\color{#d91a1a}-78.56\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 3.9898ms 1.9329ms 517.3542 Ops/s 552.7010 Ops/s $\textbf{\color{#d91a1a}-6.40\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 7.3539ms 1.3064ms 765.4332 Ops/s 805.3738 Ops/s $\color{#d91a1a}-4.96\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 6.5465ms 5.0412ms 198.3665 Ops/s 160.5633 Ops/s $\textbf{\color{#35bf28}+23.54\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.9159ms 1.8486ms 540.9478 Ops/s 477.7089 Ops/s $\textbf{\color{#35bf28}+13.24\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 6.3134ms 1.2703ms 787.2191 Ops/s 704.6573 Ops/s $\textbf{\color{#35bf28}+11.72\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.6584s 18.3065ms 54.6253 Ops/s 185.4539 Ops/s $\textbf{\color{#d91a1a}-70.55\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 6.2367ms 2.1421ms 466.8310 Ops/s 467.6496 Ops/s $\color{#d91a1a}-0.18\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.3919ms 1.1848ms 844.0400 Ops/s 55.4431 Ops/s $\textbf{\color{#35bf28}+1422.35\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 41.0489ms 38.9263ms 25.6896 Ops/s 25.7023 Ops/s $\color{#d91a1a}-0.05\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.7881ms 18.1774ms 55.0135 Ops/s 54.4410 Ops/s $\color{#35bf28}+1.05\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 44.5686ms 40.1481ms 24.9078 Ops/s 24.3933 Ops/s $\color{#35bf28}+2.11\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.9574ms 18.7929ms 53.2117 Ops/s 53.2255 Ops/s $\color{#d91a1a}-0.03\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 44.5462ms 42.4405ms 23.5624 Ops/s 23.6306 Ops/s $\color{#d91a1a}-0.29\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.7735ms 20.1644ms 49.5923 Ops/s 49.3862 Ops/s $\color{#35bf28}+0.42\%$
test_storage_write_lazystack[50-img_shape0-small] 0.8743ms 0.2301ms 4.3461 KOps/s 4.5899 KOps/s $\textbf{\color{#d91a1a}-5.31\%}$
test_storage_write_lazystack[100-img_shape1-atari] 1.7055ms 1.3671ms 731.4710 Ops/s 737.7719 Ops/s $\color{#d91a1a}-0.85\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.7545ms 2.3102ms 432.8572 Ops/s 432.4771 Ops/s $\color{#35bf28}+0.09\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.1109ms 2.9199ms 342.4726 Ops/s 347.3819 Ops/s $\color{#d91a1a}-1.41\%$
test_storage_write_contiguous[50-img_shape0-small] 0.5358ms 0.1701ms 5.8805 KOps/s 6.0187 KOps/s $\color{#d91a1a}-2.30\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3931ms 0.2329ms 4.2930 KOps/s 4.2681 KOps/s $\color{#35bf28}+0.58\%$
test_storage_write_contiguous[100-img_shape2-large_img] 1.9500ms 1.7779ms 562.4643 Ops/s 543.3703 Ops/s $\color{#35bf28}+3.51\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.5857ms 1.3812ms 723.9959 Ops/s 717.5430 Ops/s $\color{#35bf28}+0.90\%$
test_collector_stack_then_write[50-img_shape0-small] 1.3175ms 1.1548ms 865.9871 Ops/s 870.0505 Ops/s $\color{#d91a1a}-0.47\%$
test_collector_stack_then_write[100-img_shape1-atari] 3.8167ms 3.6197ms 276.2695 Ops/s 278.8514 Ops/s $\color{#d91a1a}-0.93\%$
test_collector_stack_then_write[100-img_shape2-large_img] 5.9553ms 5.6735ms 176.2594 Ops/s 172.6535 Ops/s $\color{#35bf28}+2.09\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.1259ms 6.9307ms 144.2848 Ops/s 141.5608 Ops/s $\color{#35bf28}+1.92\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4404ms 0.2777ms 3.6009 KOps/s 3.6454 KOps/s $\color{#d91a1a}-1.22\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7215ms 1.5561ms 642.6174 Ops/s 656.0202 Ops/s $\color{#d91a1a}-2.04\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.7135ms 2.4281ms 411.8408 Ops/s 409.3859 Ops/s $\color{#35bf28}+0.60\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.3980ms 3.1191ms 320.6074 Ops/s 323.8943 Ops/s $\color{#d91a1a}-1.01\%$
test_collector_without_rb[100-img_shape0-atari] 34.4576ms 33.4915ms 29.8584 Ops/s 30.2327 Ops/s $\color{#d91a1a}-1.24\%$
test_collector_without_rb[200-img_shape1-large_batch] 66.7240ms 65.6280ms 15.2374 Ops/s 15.3268 Ops/s $\color{#d91a1a}-0.58\%$
test_collector_with_rb[100-img_shape0-atari] 38.8018ms 37.8771ms 26.4012 Ops/s 26.6543 Ops/s $\color{#d91a1a}-0.95\%$
test_collector_with_rb[200-img_shape1-large_batch] 76.0888ms 75.1861ms 13.3003 Ops/s 13.4810 Ops/s $\color{#d91a1a}-1.34\%$
test_collector_without_rb_cuda[100-img_shape0-atari] 58.2142ms 57.2455ms 17.4686 Ops/s 17.7656 Ops/s $\color{#d91a1a}-1.67\%$
test_collector_without_rb_cuda[200-img_shape1-large_batch] 0.1155s 0.1126s 8.8824 Ops/s 8.9986 Ops/s $\color{#d91a1a}-1.29\%$
test_collector_with_rb_cuda[100-img_shape0-atari] 59.8349ms 58.2246ms 17.1749 Ops/s 17.1434 Ops/s $\color{#35bf28}+0.18\%$
test_collector_with_rb_cuda[200-img_shape1-large_batch] 0.1195s 0.1161s 8.6098 Ops/s 8.7055 Ops/s $\color{#d91a1a}-1.10\%$

[ghstack-poisoned]
@vmoens vmoens closed this Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance Performance issue or suggestion for improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant