Code for the paper Reducing Exploitability with Population Based Training. We reduce exploitability by adversarial RL policies by training against a diverse population of opponents.
Should work with python 3.7, 3.8
Install using Docker or using the following process.
conda create -n defense python=3.8
Install necessary packages:
pip install -r requirements.txt
pip install -r requirements-dev.txt
For generating videos:
conda install ffmpeg
ffmpeg can optionally also be installed with your systems package manager
To change the output path change TrialSettings.out_path via gin-config.
This can be overwritten with the environment variable POLICY_DEFENSE_OUT.
Most frequently used settings can be changed via gin.
The settings intended to be configured with gin are:
TrialSettings(aprl_defense.trial.settings.TrialSettings)RLSettings(aprl_defense.trial.settings.RLSettings)- Additionally, depending on whether one of these modes is used
selfplay(aprl_defense.training_managers.simple_training_manager.SelfplayTrainingManager)single-agent- no additonal argumentsattack(aprl_defense.training_managers.simple_training_manager.AttackManager)pbt(aprl_defense.training_managers.pbt_manager.PBTManager)
For further documentation on the configurable parameters check the Documentation of the respective classes.
Experiments for the paper were run with the settings in src/gin/icml.
To change hyperparameters we recommend creating RLlib configs that can be passed in via override / override_f gin settings.
Many experiments were run using dedicated python scripts, located in src/experiments.
The following examples should clarify how to specify training for different modes (run from src folder).
python -m aprl_defense.train \
-f "gin/icml/selfplay/laser_tag.gin" \
-p "TrialSettings.num_workers = 10" \
-p "TrialSettings.wandb_group = 'experiment'"python -m aprl_defense.train \
-f "gin/icml/attack/sp_laser_tag.gin" \
-p "TrialSettings.num_workers = 10" \
-p "TrialSettings.wandb_group = 'experiment'" \
-p "attack.victim_artifact = '<wandb artifact id>'" \
-p "attack.victim_policy_name = '<name of victim policy>' "Attention: PBT only runs with the modified version of ray.
python -m aprl_defense.train \
-f "gin/icml/pbt/laser_tag.gin" \
-p "TrialSettings.num_workers = 10" \
-p "TrialSettings.wandb_group = 'experiment'" \
-p "pbt.main_id = 0" \
-p "pbt.num_ops = 50" \
-p "TrialSettings.num_workers = 50" 'In all but the most basic setups creating an RLlib config for multiagent training requires programmatically creating a config in python and these configs could not be created simply by passing in a config file. For convenience the most commonly changed hyperparameters and set-up configurations can be changed with gin, additional modifications can be performed by overriding the RLlib config.