Skip to content

Fix torch autocast deprecation warning in gradient checkpointing#1010

Draft
Copilot wants to merge 6 commits intomasterfrom
copilot/fix-autocast-deprecation-warning
Draft

Fix torch autocast deprecation warning in gradient checkpointing#1010
Copilot wants to merge 6 commits intomasterfrom
copilot/fix-autocast-deprecation-warning

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 24, 2026

  • Confirmed torch 2.6.0 patches the torch.load weights_only=True RCE vulnerability
  • Confirmed torch 2.6.0+cu121 does not exist; switched CUDA backend from cu121 → cu124
  • Updated pyproject.toml: torch ^2.5^2.6, source URL cu121 → cu124
  • Updated poetry.lock:
    • torch: 2.5.1+cu121 → 2.6.0+cu124
    • All NVIDIA CUDA packages: 12.1.x → 12.4.x versions
    • nvidia-cusparselt-cu12: new dependency 0.6.2 (required by torch 2.6.0)
    • triton: 3.1.0 → 3.2.0
    • sympy: 1.13.3 → 1.13.1 (pinned exactly by torch 2.6.0+cu124)
    • Updated content-hash to match new pyproject.toml
  • Added explanatory comment for use_reentrant=False in gradient_checkpointing_kwargs

This change is Reviewable

Copilot AI linked an issue Apr 24, 2026 that may be closed by this pull request
Copilot AI changed the title [WIP] Fix Torch autocast deprecation warning during training Fix torch autocast deprecation warning in gradient checkpointing Apr 24, 2026
Copilot AI requested a review from benjaminking April 24, 2026 18:21
Copy link
Copy Markdown
Collaborator

@benjaminking benjaminking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a good change to make, but I tested it with use_reentrant=False and I still got the autocast warning, but it came from a different line this time:

/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:1399: FutureWarning:
`torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.

@benjaminking reviewed 1 file and all commit messages, and made 1 comment.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on copilot[bot]).

…cpu.amp.autocast

Agent-Logs-Url: https://github.com/sillsdev/silnlp/sessions/4592cc6c-17bc-498d-9320-6da2e1b8729b

Co-authored-by: benjaminking <1214233+benjaminking@users.noreply.github.com>
Copilot AI and others added 2 commits April 24, 2026 20:16
@benjaminking
Copy link
Copy Markdown
Collaborator

This warning was being produced by Torch code, and it looks like PyTorch's policy is to stop producing patches when the next minor version is released. So, the way to fix it was to upgrade the Torch version from 2.4 to 2.6, which also involved moving from CUDA 12.1 to 12.4 with the PyTorch wheel. I'm sure we'll eventually want to upgrade PyTorch and CUDA versions, but we should discuss whether now is the right time.

On jobs_backlog, the installed CUDA version is 12.4, and on cheetah_47gb, it is 13.0. Weirdly though, this branch works fine in an interactive session in jobs_backlog, but fails for a missing library file (libcudnn.so.9) when sending a task to jobs_backlog.

@mshannon-sil
Copy link
Copy Markdown
Collaborator

Strange. Thinking through this, we know that the silnlp container has pip requirements already installed based on the poetry lock file at the time the docker image was made. When running remote execution with this image, clearml looks at the poetry.lock file and does the installation process again, though most of the time everything is already installed so it takes little time. When running an interactive session, this installation does not happen and you're just using the already installed packages unless you manually install more packages. I'm guessing this difference is the cause for any different results you're seeing. I'd think that if you create a new docker image with the updated requirements, this bug would go away, but I'm not 100% positive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Torch autocast deprecation warning

3 participants