Skip to content

[Performance] Streamline collector inner loop carrier update#3564

Closed
vmoens wants to merge 2 commits intogh/vmoens/245/basefrom
gh/vmoens/245/head
Closed

[Performance] Streamline collector inner loop carrier update#3564
vmoens wants to merge 2 commits intogh/vmoens/245/basefrom
gh/vmoens/245/head

Conversation

@vmoens
Copy link
Copy Markdown
Collaborator

@vmoens vmoens commented Mar 23, 2026

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 23, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3564

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure, 15 Pending, 1 Unrelated Failure

As of commit c43948a with merge base a4301ee (image):

NEW FAILURE - The following job has failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]
if self._shuttle_has_no_device:
self._carrier.clear_device_()
self._carrier.set("collector", collector_data)
self._carrier._set_str(
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using private tensordict methods is a footgun, let's not do that

@vmoens vmoens closed this Apr 11, 2026
@github-actions
Copy link
Copy Markdown
Contributor

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 172. Improved: $\large\color{#35bf28}15$. Worsened: $\large\color{#d91a1a}11$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 80.7291μs 79.6768μs 12.5507 KOps/s 11.9655 KOps/s $\color{#35bf28}+4.89\%$
test_tensor_to_bytestream_speed[torch.save] 0.1398ms 0.1387ms 7.2081 KOps/s 7.0881 KOps/s $\color{#35bf28}+1.69\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1046s 0.1039s 9.6247 Ops/s 9.8788 Ops/s $\color{#d91a1a}-2.57\%$
test_tensor_to_bytestream_speed[numpy] 2.4550μs 2.4515μs 407.9181 KOps/s 413.2605 KOps/s $\color{#d91a1a}-1.29\%$
test_tensor_to_bytestream_speed[safetensors] 35.8824μs 35.6408μs 28.0578 KOps/s 28.6791 KOps/s $\color{#d91a1a}-2.17\%$
test_simple 0.5411s 0.5388s 1.8561 Ops/s 1.7661 Ops/s $\textbf{\color{#35bf28}+5.10\%}$
test_transformed 1.0704s 1.0687s 0.9357 Ops/s 0.9111 Ops/s $\color{#35bf28}+2.70\%$
test_serial 1.6516s 1.6460s 0.6075 Ops/s 0.5942 Ops/s $\color{#35bf28}+2.25\%$
test_parallel 0.9975s 0.9957s 1.0043 Ops/s 0.9666 Ops/s $\color{#35bf28}+3.90\%$
test_step_mdp_speed[True-True-True-True-True] 0.1561ms 40.3535μs 24.7810 KOps/s 24.3750 KOps/s $\color{#35bf28}+1.67\%$
test_step_mdp_speed[True-True-True-True-False] 57.7210μs 22.9532μs 43.5668 KOps/s 44.9319 KOps/s $\color{#d91a1a}-3.04\%$
test_step_mdp_speed[True-True-True-False-True] 72.7310μs 22.7836μs 43.8912 KOps/s 44.3610 KOps/s $\color{#d91a1a}-1.06\%$
test_step_mdp_speed[True-True-True-False-False] 36.6110μs 12.5007μs 79.9953 KOps/s 80.2010 KOps/s $\color{#d91a1a}-0.26\%$
test_step_mdp_speed[True-True-False-True-True] 84.3020μs 43.6611μs 22.9037 KOps/s 23.1945 KOps/s $\color{#d91a1a}-1.25\%$
test_step_mdp_speed[True-True-False-True-False] 53.9710μs 25.3260μs 39.4851 KOps/s 41.1222 KOps/s $\color{#d91a1a}-3.98\%$
test_step_mdp_speed[True-True-False-False-True] 76.8220μs 26.2593μs 38.0817 KOps/s 39.3645 KOps/s $\color{#d91a1a}-3.26\%$
test_step_mdp_speed[True-True-False-False-False] 44.6310μs 15.0714μs 66.3508 KOps/s 67.4785 KOps/s $\color{#d91a1a}-1.67\%$
test_step_mdp_speed[True-False-True-True-True] 0.1067ms 47.5031μs 21.0513 KOps/s 21.8098 KOps/s $\color{#d91a1a}-3.48\%$
test_step_mdp_speed[True-False-True-True-False] 68.0810μs 28.6667μs 34.8837 KOps/s 36.7237 KOps/s $\textbf{\color{#d91a1a}-5.01\%}$
test_step_mdp_speed[True-False-True-False-True] 57.2320μs 25.6970μs 38.9151 KOps/s 39.3963 KOps/s $\color{#d91a1a}-1.22\%$
test_step_mdp_speed[True-False-True-False-False] 52.8710μs 15.1916μs 65.8257 KOps/s 66.9572 KOps/s $\color{#d91a1a}-1.69\%$
test_step_mdp_speed[True-False-False-True-True] 88.6310μs 49.2470μs 20.3058 KOps/s 20.9789 KOps/s $\color{#d91a1a}-3.21\%$
test_step_mdp_speed[True-False-False-True-False] 73.8520μs 30.7945μs 32.4734 KOps/s 33.4867 KOps/s $\color{#d91a1a}-3.03\%$
test_step_mdp_speed[True-False-False-False-True] 63.2510μs 29.0864μs 34.3803 KOps/s 35.8590 KOps/s $\color{#d91a1a}-4.12\%$
test_step_mdp_speed[True-False-False-False-False] 52.7610μs 17.4407μs 57.3372 KOps/s 57.4566 KOps/s $\color{#d91a1a}-0.21\%$
test_step_mdp_speed[False-True-True-True-True] 80.2620μs 46.2354μs 21.6285 KOps/s 21.9083 KOps/s $\color{#d91a1a}-1.28\%$
test_step_mdp_speed[False-True-True-True-False] 59.3210μs 27.7356μs 36.0547 KOps/s 36.7129 KOps/s $\color{#d91a1a}-1.79\%$
test_step_mdp_speed[False-True-True-False-True] 2.4577ms 29.4872μs 33.9130 KOps/s 33.9105 KOps/s $+0.01\%$
test_step_mdp_speed[False-True-True-False-False] 55.7610μs 16.8300μs 59.4176 KOps/s 59.3401 KOps/s $\color{#35bf28}+0.13\%$
test_step_mdp_speed[False-True-False-True-True] 82.0910μs 49.2941μs 20.2864 KOps/s 20.9110 KOps/s $\color{#d91a1a}-2.99\%$
test_step_mdp_speed[False-True-False-True-False] 70.2320μs 30.4104μs 32.8835 KOps/s 33.9862 KOps/s $\color{#d91a1a}-3.24\%$
test_step_mdp_speed[False-True-False-False-True] 95.7020μs 32.6177μs 30.6582 KOps/s 32.1820 KOps/s $\color{#d91a1a}-4.73\%$
test_step_mdp_speed[False-True-False-False-False] 59.7710μs 19.5696μs 51.0997 KOps/s 53.6836 KOps/s $\color{#d91a1a}-4.81\%$
test_step_mdp_speed[False-False-True-True-True] 0.1002ms 51.1527μs 19.5493 KOps/s 19.9234 KOps/s $\color{#d91a1a}-1.88\%$
test_step_mdp_speed[False-False-True-True-False] 90.9120μs 32.8311μs 30.4589 KOps/s 31.2806 KOps/s $\color{#d91a1a}-2.63\%$
test_step_mdp_speed[False-False-True-False-True] 91.8120μs 31.8783μs 31.3693 KOps/s 32.0787 KOps/s $\color{#d91a1a}-2.21\%$
test_step_mdp_speed[False-False-True-False-False] 78.7910μs 19.2533μs 51.9393 KOps/s 53.3318 KOps/s $\color{#d91a1a}-2.61\%$
test_step_mdp_speed[False-False-False-True-True] 91.4020μs 52.9661μs 18.8800 KOps/s 19.4326 KOps/s $\color{#d91a1a}-2.84\%$
test_step_mdp_speed[False-False-False-True-False] 75.6110μs 35.6453μs 28.0542 KOps/s 28.6618 KOps/s $\color{#d91a1a}-2.12\%$
test_step_mdp_speed[False-False-False-False-True] 65.8820μs 34.2343μs 29.2105 KOps/s 29.9699 KOps/s $\color{#d91a1a}-2.53\%$
test_step_mdp_speed[False-False-False-False-False] 47.9110μs 21.9964μs 45.4619 KOps/s 47.3811 KOps/s $\color{#d91a1a}-4.05\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7060s 0.6990s 1.4306 Ops/s 1.3508 Ops/s $\textbf{\color{#35bf28}+5.91\%}$
test_non_tensor_env_rollout_speed[1000-single-False] 0.6970s 0.5923s 1.6883 Ops/s 1.6711 Ops/s $\color{#35bf28}+1.03\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.6916s 1.5995s 0.6252 Ops/s 0.6141 Ops/s $\color{#35bf28}+1.81\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4750s 1.3915s 0.7186 Ops/s 0.7024 Ops/s $\color{#35bf28}+2.31\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9419s 1.8507s 0.5403 Ops/s 0.5322 Ops/s $\color{#35bf28}+1.53\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7206s 1.6381s 0.6104 Ops/s 0.6058 Ops/s $\color{#35bf28}+0.76\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6838s 4.5433s 0.2201 Ops/s 0.2181 Ops/s $\color{#35bf28}+0.90\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.3496s 4.2992s 0.2326 Ops/s 0.2259 Ops/s $\color{#35bf28}+2.95\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9095s 1.8152s 0.5509 Ops/s 0.5333 Ops/s $\color{#35bf28}+3.30\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6239s 1.5506s 0.6449 Ops/s 0.6217 Ops/s $\color{#35bf28}+3.74\%$
test_values[generalized_advantage_estimate-True-True] 10.5542ms 10.1413ms 98.6067 Ops/s 101.5460 Ops/s $\color{#d91a1a}-2.89\%$
test_values[vec_generalized_advantage_estimate-True-True] 20.1942ms 17.5970ms 56.8279 Ops/s 56.3938 Ops/s $\color{#35bf28}+0.77\%$
test_values[td0_return_estimate-False-False] 0.2276ms 0.1291ms 7.7487 KOps/s 7.5762 KOps/s $\color{#35bf28}+2.28\%$
test_values[td1_return_estimate-False-False] 28.6030ms 27.6410ms 36.1781 Ops/s 36.5166 Ops/s $\color{#d91a1a}-0.93\%$
test_values[vec_td1_return_estimate-False-False] 18.4552ms 17.6639ms 56.6126 Ops/s 56.5417 Ops/s $\color{#35bf28}+0.13\%$
test_values[td_lambda_return_estimate-True-False] 42.7589ms 41.5758ms 24.0525 Ops/s 24.7508 Ops/s $\color{#d91a1a}-2.82\%$
test_values[vec_td_lambda_return_estimate-True-False] 18.8804ms 17.6949ms 56.5133 Ops/s 56.7068 Ops/s $\color{#d91a1a}-0.34\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.7945ms 8.6779ms 115.2347 Ops/s 115.3986 Ops/s $\color{#d91a1a}-0.14\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.7375ms 1.5230ms 656.5891 Ops/s 650.7298 Ops/s $\color{#35bf28}+0.90\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4749ms 0.4246ms 2.3553 KOps/s 2.4040 KOps/s $\color{#d91a1a}-2.03\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 35.2457ms 34.9405ms 28.6200 Ops/s 28.6825 Ops/s $\color{#d91a1a}-0.22\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 2.0274ms 1.7344ms 576.5549 Ops/s 575.9757 Ops/s $\color{#35bf28}+0.10\%$
test_dqn_speed[False-None] 1.5130ms 1.4064ms 711.0430 Ops/s 723.4596 Ops/s $\color{#d91a1a}-1.72\%$
test_dqn_speed[False-backward] 2.0676ms 1.9668ms 508.4428 Ops/s 525.8622 Ops/s $\color{#d91a1a}-3.31\%$
test_dqn_speed[True-None] 0.7737ms 0.5686ms 1.7586 KOps/s 1.7198 KOps/s $\color{#35bf28}+2.26\%$
test_dqn_speed[True-backward] 1.1052ms 1.0504ms 952.0119 Ops/s 842.4319 Ops/s $\textbf{\color{#35bf28}+13.01\%}$
test_dqn_speed[reduce-overhead-None] 0.6452ms 0.5535ms 1.8067 KOps/s 1.7588 KOps/s $\color{#35bf28}+2.72\%$
test_ddpg_speed[False-None] 3.1856ms 2.8414ms 351.9383 Ops/s 358.4120 Ops/s $\color{#d91a1a}-1.81\%$
test_ddpg_speed[False-backward] 4.3312ms 4.1019ms 243.7888 Ops/s 245.9821 Ops/s $\color{#d91a1a}-0.89\%$
test_ddpg_speed[True-None] 1.6559ms 1.4551ms 687.2421 Ops/s 682.5829 Ops/s $\color{#35bf28}+0.68\%$
test_ddpg_speed[True-backward] 2.5589ms 2.4827ms 402.7812 Ops/s 336.7038 Ops/s $\textbf{\color{#35bf28}+19.62\%}$
test_ddpg_speed[reduce-overhead-None] 1.5790ms 1.4351ms 696.8245 Ops/s 706.4508 Ops/s $\color{#d91a1a}-1.36\%$
test_sac_speed[False-None] 9.1602ms 8.1932ms 122.0525 Ops/s 122.2332 Ops/s $\color{#d91a1a}-0.15\%$
test_sac_speed[False-backward] 12.2126ms 11.4781ms 87.1223 Ops/s 86.9187 Ops/s $\color{#35bf28}+0.23\%$
test_sac_speed[True-None] 2.3715ms 2.2245ms 449.5458 Ops/s 430.0462 Ops/s $\color{#35bf28}+4.53\%$
test_sac_speed[True-backward] 4.4459ms 4.1929ms 238.4967 Ops/s 207.8492 Ops/s $\textbf{\color{#35bf28}+14.75\%}$
test_sac_speed[reduce-overhead-None] 2.7757ms 2.2692ms 440.6901 Ops/s 461.0397 Ops/s $\color{#d91a1a}-4.41\%$
test_redq_speed[False-None] 10.9784ms 10.4614ms 95.5897 Ops/s 92.8954 Ops/s $\color{#35bf28}+2.90\%$
test_redq_speed[False-backward] 18.9073ms 18.0457ms 55.4148 Ops/s 55.2581 Ops/s $\color{#35bf28}+0.28\%$
test_redq_speed[True-None] 4.9089ms 4.7171ms 211.9948 Ops/s 199.7616 Ops/s $\textbf{\color{#35bf28}+6.12\%}$
test_redq_speed[reduce-overhead-None] 4.8590ms 4.6151ms 216.6797 Ops/s 212.8508 Ops/s $\color{#35bf28}+1.80\%$
test_redq_deprec_speed[False-None] 11.9509ms 11.2202ms 89.1247 Ops/s 87.6003 Ops/s $\color{#35bf28}+1.74\%$
test_redq_deprec_speed[False-backward] 16.5838ms 16.2380ms 61.5841 Ops/s 60.4018 Ops/s $\color{#35bf28}+1.96\%$
test_redq_deprec_speed[True-None] 3.9491ms 3.7291ms 268.1641 Ops/s 269.9320 Ops/s $\color{#d91a1a}-0.65\%$
test_redq_deprec_speed[True-backward] 7.8864ms 7.6196ms 131.2405 Ops/s 116.8822 Ops/s $\textbf{\color{#35bf28}+12.28\%}$
test_redq_deprec_speed[reduce-overhead-None] 3.9143ms 3.6957ms 270.5819 Ops/s 274.2837 Ops/s $\color{#d91a1a}-1.35\%$
test_td3_speed[False-None] 8.1183ms 8.0244ms 124.6195 Ops/s 123.4828 Ops/s $\color{#35bf28}+0.92\%$
test_td3_speed[False-backward] 11.1291ms 10.9159ms 91.6093 Ops/s 89.4108 Ops/s $\color{#35bf28}+2.46\%$
test_td3_speed[True-None] 2.0991ms 1.8690ms 535.0369 Ops/s 534.1548 Ops/s $\color{#35bf28}+0.17\%$
test_td3_speed[True-backward] 3.8288ms 3.6617ms 273.1006 Ops/s 256.3119 Ops/s $\textbf{\color{#35bf28}+6.55\%}$
test_td3_speed[reduce-overhead-None] 1.8805ms 1.8354ms 544.8355 Ops/s 532.8833 Ops/s $\color{#35bf28}+2.24\%$
test_cql_speed[False-None] 30.1267ms 26.6639ms 37.5039 Ops/s 37.7481 Ops/s $\color{#d91a1a}-0.65\%$
test_cql_speed[False-backward] 40.5190ms 36.1410ms 27.6694 Ops/s 27.5797 Ops/s $\color{#35bf28}+0.33\%$
test_cql_speed[True-None] 12.9020ms 12.6128ms 79.2844 Ops/s 78.9369 Ops/s $\color{#35bf28}+0.44\%$
test_cql_speed[True-backward] 18.6059ms 18.0745ms 55.3267 Ops/s 56.0620 Ops/s $\color{#d91a1a}-1.31\%$
test_cql_speed[reduce-overhead-None] 16.3044ms 12.9199ms 77.4000 Ops/s 77.6969 Ops/s $\color{#d91a1a}-0.38\%$
test_a2c_speed[False-None] 5.7752ms 5.4256ms 184.3125 Ops/s 183.2411 Ops/s $\color{#35bf28}+0.58\%$
test_a2c_speed[False-backward] 12.3604ms 12.0067ms 83.2869 Ops/s 83.2592 Ops/s $\color{#35bf28}+0.03\%$
test_a2c_speed[True-None] 4.1609ms 3.8547ms 259.4202 Ops/s 255.1261 Ops/s $\color{#35bf28}+1.68\%$
test_a2c_speed[True-backward] 9.0539ms 8.8730ms 112.7013 Ops/s 110.0381 Ops/s $\color{#35bf28}+2.42\%$
test_a2c_speed[reduce-overhead-None] 4.0338ms 3.8767ms 257.9523 Ops/s 255.2098 Ops/s $\color{#35bf28}+1.07\%$
test_ppo_speed[False-None] 6.1672ms 5.8296ms 171.5397 Ops/s 166.7827 Ops/s $\color{#35bf28}+2.85\%$
test_ppo_speed[False-backward] 12.9427ms 12.6604ms 78.9865 Ops/s 77.8762 Ops/s $\color{#35bf28}+1.43\%$
test_ppo_speed[True-None] 4.0837ms 3.8558ms 259.3498 Ops/s 257.8933 Ops/s $\color{#35bf28}+0.56\%$
test_ppo_speed[True-backward] 8.9719ms 8.8248ms 113.3175 Ops/s 109.8356 Ops/s $\color{#35bf28}+3.17\%$
test_ppo_speed[reduce-overhead-None] 3.9879ms 3.8340ms 260.8226 Ops/s 255.5518 Ops/s $\color{#35bf28}+2.06\%$
test_reinforce_speed[False-None] 4.8133ms 4.5750ms 218.5803 Ops/s 217.4235 Ops/s $\color{#35bf28}+0.53\%$
test_reinforce_speed[False-backward] 7.8260ms 7.5382ms 132.6572 Ops/s 132.1075 Ops/s $\color{#35bf28}+0.42\%$
test_reinforce_speed[True-None] 3.1872ms 3.0151ms 331.6590 Ops/s 332.3039 Ops/s $\color{#d91a1a}-0.19\%$
test_reinforce_speed[True-backward] 8.4901ms 8.1637ms 122.4936 Ops/s 121.4921 Ops/s $\color{#35bf28}+0.82\%$
test_reinforce_speed[reduce-overhead-None] 3.3292ms 2.9961ms 333.7723 Ops/s 274.7245 Ops/s $\textbf{\color{#35bf28}+21.49\%}$
test_iql_speed[False-None] 20.8282ms 20.0920ms 49.7710 Ops/s 49.2074 Ops/s $\color{#35bf28}+1.15\%$
test_iql_speed[False-backward] 31.5860ms 30.8135ms 32.4533 Ops/s 32.1709 Ops/s $\color{#35bf28}+0.88\%$
test_iql_speed[True-None] 8.9831ms 8.6257ms 115.9319 Ops/s 115.5345 Ops/s $\color{#35bf28}+0.34\%$
test_iql_speed[True-backward] 17.4090ms 16.9657ms 58.9423 Ops/s 57.3310 Ops/s $\color{#35bf28}+2.81\%$
test_iql_speed[reduce-overhead-None] 9.0206ms 8.6911ms 115.0597 Ops/s 111.9840 Ops/s $\color{#35bf28}+2.75\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0484ms 5.8763ms 170.1738 Ops/s 168.1538 Ops/s $\color{#35bf28}+1.20\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.9985ms 0.3079ms 3.2475 KOps/s 3.4041 KOps/s $\color{#d91a1a}-4.60\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5510ms 0.2736ms 3.6551 KOps/s 3.6779 KOps/s $\color{#d91a1a}-0.62\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.9372ms 5.6534ms 176.8851 Ops/s 175.0743 Ops/s $\color{#35bf28}+1.03\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.7743ms 0.3034ms 3.2958 KOps/s 3.5007 KOps/s $\textbf{\color{#d91a1a}-5.85\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7965ms 0.3041ms 3.2880 KOps/s 3.2606 KOps/s $\color{#35bf28}+0.84\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7095ms 1.4371ms 695.8482 Ops/s 784.1258 Ops/s $\textbf{\color{#d91a1a}-11.26\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6299ms 1.3624ms 733.9836 Ops/s 834.7035 Ops/s $\textbf{\color{#d91a1a}-12.07\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 11.7640ms 5.9228ms 168.8401 Ops/s 171.0708 Ops/s $\color{#d91a1a}-1.30\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.8763ms 0.5051ms 1.9799 KOps/s 2.2141 KOps/s $\textbf{\color{#d91a1a}-10.58\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7202ms 0.4845ms 2.0640 KOps/s 2.3613 KOps/s $\textbf{\color{#d91a1a}-12.59\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.7262ms 5.6159ms 178.0653 Ops/s 173.5395 Ops/s $\color{#35bf28}+2.61\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.9919ms 0.2895ms 3.4540 KOps/s 3.1387 KOps/s $\textbf{\color{#35bf28}+10.05\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4678ms 0.2671ms 3.7439 KOps/s 3.1577 KOps/s $\textbf{\color{#35bf28}+18.56\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.8194ms 5.5688ms 179.5726 Ops/s 175.5939 Ops/s $\color{#35bf28}+2.27\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.0181ms 0.3085ms 3.2418 KOps/s 2.7881 KOps/s $\textbf{\color{#35bf28}+16.27\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5007ms 0.2690ms 3.7179 KOps/s 3.2060 KOps/s $\textbf{\color{#35bf28}+15.97\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.9142ms 5.7214ms 174.7811 Ops/s 170.9541 Ops/s $\color{#35bf28}+2.24\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.2456ms 0.4468ms 2.2384 KOps/s 2.2072 KOps/s $\color{#35bf28}+1.41\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6267ms 0.4233ms 2.3625 KOps/s 2.2213 KOps/s $\textbf{\color{#35bf28}+6.36\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.3270ms 4.9488ms 202.0702 Ops/s 49.4138 Ops/s $\textbf{\color{#35bf28}+308.93\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 9.7169ms 2.1070ms 474.6019 Ops/s 501.6091 Ops/s $\textbf{\color{#d91a1a}-5.38\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 7.9475ms 1.2476ms 801.5658 Ops/s 833.8460 Ops/s $\color{#d91a1a}-3.87\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.6366s 17.6255ms 56.7361 Ops/s 200.1120 Ops/s $\textbf{\color{#d91a1a}-71.65\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 11.3015ms 1.9957ms 501.0674 Ops/s 508.5516 Ops/s $\color{#d91a1a}-1.47\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 6.9627ms 1.1973ms 835.2058 Ops/s 871.7006 Ops/s $\color{#d91a1a}-4.19\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 6.6384ms 5.1967ms 192.4316 Ops/s 192.1925 Ops/s $\color{#35bf28}+0.12\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 13.7554ms 2.0801ms 480.7397 Ops/s 519.4681 Ops/s $\textbf{\color{#d91a1a}-7.46\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.2633ms 1.0475ms 954.6308 Ops/s 942.4741 Ops/s $\color{#35bf28}+1.29\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 42.8951ms 38.6964ms 25.8422 Ops/s 25.6371 Ops/s $\color{#35bf28}+0.80\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.1992ms 17.8664ms 55.9710 Ops/s 53.8383 Ops/s $\color{#35bf28}+3.96\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 44.8000ms 39.9242ms 25.0475 Ops/s 24.6598 Ops/s $\color{#35bf28}+1.57\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.7506ms 18.2241ms 54.8724 Ops/s 53.6970 Ops/s $\color{#35bf28}+2.19\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 42.9563ms 41.5255ms 24.0816 Ops/s 23.5176 Ops/s $\color{#35bf28}+2.40\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 0.5960s 31.2542ms 31.9957 Ops/s 49.3319 Ops/s $\textbf{\color{#d91a1a}-35.14\%}$
test_storage_write_lazystack[50-img_shape0-small] 0.8621ms 0.2217ms 4.5097 KOps/s 4.5408 KOps/s $\color{#d91a1a}-0.68\%$
test_storage_write_lazystack[100-img_shape1-atari] 1.6847ms 1.4180ms 705.2229 Ops/s 705.7092 Ops/s $\color{#d91a1a}-0.07\%$
test_storage_write_lazystack[100-img_shape2-large_img] 2.6711ms 2.4675ms 405.2717 Ops/s 416.6749 Ops/s $\color{#d91a1a}-2.74\%$
test_storage_write_lazystack[200-img_shape3-large_batch] 3.2177ms 3.0034ms 332.9542 Ops/s 336.1898 Ops/s $\color{#d91a1a}-0.96\%$
test_storage_write_contiguous[50-img_shape0-small] 0.2158ms 0.1347ms 7.4227 KOps/s 7.4201 KOps/s $\color{#35bf28}+0.04\%$
test_storage_write_contiguous[100-img_shape1-atari] 0.3484ms 0.2017ms 4.9579 KOps/s 5.2172 KOps/s $\color{#d91a1a}-4.97\%$
test_storage_write_contiguous[100-img_shape2-large_img] 1.9387ms 1.7861ms 559.8677 Ops/s 558.9840 Ops/s $\color{#35bf28}+0.16\%$
test_storage_write_contiguous[200-img_shape3-large_batch] 1.4614ms 1.2908ms 774.7038 Ops/s 769.1091 Ops/s $\color{#35bf28}+0.73\%$
test_collector_stack_then_write[50-img_shape0-small] 1.2813ms 1.1020ms 907.4035 Ops/s 904.7091 Ops/s $\color{#35bf28}+0.30\%$
test_collector_stack_then_write[100-img_shape1-atari] 7.5665ms 3.5378ms 282.6592 Ops/s 281.4694 Ops/s $\color{#35bf28}+0.42\%$
test_collector_stack_then_write[100-img_shape2-large_img] 10.5891ms 5.7887ms 172.7517 Ops/s 174.8590 Ops/s $\color{#d91a1a}-1.21\%$
test_collector_stack_then_write[200-img_shape3-large_batch] 7.4980ms 7.0327ms 142.1924 Ops/s 141.1533 Ops/s $\color{#35bf28}+0.74\%$
test_collector_lazystack_then_write[50-img_shape0-small] 0.4700ms 0.2756ms 3.6283 KOps/s 3.5967 KOps/s $\color{#35bf28}+0.88\%$
test_collector_lazystack_then_write[100-img_shape1-atari] 1.7064ms 1.5426ms 648.2607 Ops/s 650.7641 Ops/s $\color{#d91a1a}-0.38\%$
test_collector_lazystack_then_write[100-img_shape2-large_img] 2.8698ms 2.5819ms 387.3045 Ops/s 384.7619 Ops/s $\color{#35bf28}+0.66\%$
test_collector_lazystack_then_write[200-img_shape3-large_batch] 3.4772ms 3.2077ms 311.7489 Ops/s 311.5028 Ops/s $\color{#35bf28}+0.08\%$
test_collector_without_rb[100-img_shape0-atari] 32.7668ms 32.2707ms 30.9879 Ops/s 31.2719 Ops/s $\color{#d91a1a}-0.91\%$
test_collector_without_rb[200-img_shape1-large_batch] 64.2924ms 63.5251ms 15.7418 Ops/s 15.7883 Ops/s $\color{#d91a1a}-0.29\%$
test_collector_with_rb[100-img_shape0-atari] 37.1362ms 36.6480ms 27.2867 Ops/s 27.2283 Ops/s $\color{#35bf28}+0.21\%$
test_collector_with_rb[200-img_shape1-large_batch] 96.3671ms 81.8788ms 12.2132 Ops/s 13.9895 Ops/s $\textbf{\color{#d91a1a}-12.70\%}$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Collectors Performance Performance issue or suggestion for improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant