Is your feature request related to a problem? Please describe.
Right now, it is not possible to change the training scenario mid-run or to parallelize training to reduce training time.
Suggested solution
One idea is to run multiple containerized environments in different ports and have the agent connect to all of them in parallel. The BaseAgent needs to be extended to support a list of ports during initialization and return a list of observations during reset. The step method will have to be modified to support a list of actions and return a list of observations.
Alternatives considered
- Adapt the environment to change the scenario during reset. This allows the scenario to change, but does not improve parallelization/speed.
Additional context
A question here is whether this scales well enough. How many scenarios do we need to train in parallel?
Is your feature request related to a problem? Please describe.
Right now, it is not possible to change the training scenario mid-run or to parallelize training to reduce training time.
Suggested solution
One idea is to run multiple containerized environments in different ports and have the agent connect to all of them in parallel. The BaseAgent needs to be extended to support a list of ports during initialization and return a list of observations during
reset. Thestepmethod will have to be modified to support a list of actions and return a list of observations.Alternatives considered
Additional context
A question here is whether this scales well enough. How many scenarios do we need to train in parallel?