Fix torch autocast deprecation warning in gradient checkpointing#1010
Fix torch autocast deprecation warning in gradient checkpointing#1010
autocast deprecation warning in gradient checkpointing#1010Conversation
Agent-Logs-Url: https://github.com/sillsdev/silnlp/sessions/718c44c7-fd32-4f0e-923f-1c0164875e59 Co-authored-by: benjaminking <1214233+benjaminking@users.noreply.github.com>
autocast deprecation warning in gradient checkpointing
benjaminking
left a comment
There was a problem hiding this comment.
This is probably a good change to make, but I tested it with use_reentrant=False and I still got the autocast warning, but it came from a different line this time:
/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:1399: FutureWarning:
`torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.
@benjaminking reviewed 1 file and all commit messages, and made 1 comment.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on copilot[bot]).
…cpu.amp.autocast Agent-Logs-Url: https://github.com/sillsdev/silnlp/sessions/4592cc6c-17bc-498d-9320-6da2e1b8729b Co-authored-by: benjaminking <1214233+benjaminking@users.noreply.github.com>
…he source Agent-Logs-Url: https://github.com/sillsdev/silnlp/sessions/3eeb9ac4-ca51-42c0-a741-15f1ccea7d80 Co-authored-by: benjaminking <1214233+benjaminking@users.noreply.github.com>
Agent-Logs-Url: https://github.com/sillsdev/silnlp/sessions/3eeb9ac4-ca51-42c0-a741-15f1ccea7d80 Co-authored-by: benjaminking <1214233+benjaminking@users.noreply.github.com>
|
This warning was being produced by Torch code, and it looks like PyTorch's policy is to stop producing patches when the next minor version is released. So, the way to fix it was to upgrade the Torch version from 2.4 to 2.6, which also involved moving from CUDA 12.1 to 12.4 with the PyTorch wheel. I'm sure we'll eventually want to upgrade PyTorch and CUDA versions, but we should discuss whether now is the right time. On |
|
Strange. Thinking through this, we know that the silnlp container has pip requirements already installed based on the poetry lock file at the time the docker image was made. When running remote execution with this image, clearml looks at the poetry.lock file and does the installation process again, though most of the time everything is already installed so it takes little time. When running an interactive session, this installation does not happen and you're just using the already installed packages unless you manually install more packages. I'm guessing this difference is the cause for any different results you're seeing. I'd think that if you create a new docker image with the updated requirements, this bug would go away, but I'm not 100% positive. |
torch.loadweights_only=TrueRCE vulnerabilitypyproject.toml: torch^2.5→^2.6, source URL cu121 → cu124poetry.lock:content-hashto match new pyproject.tomluse_reentrant=Falsein gradient_checkpointing_kwargsThis change is