CUDA memory leak for Flux.Optimizer

(This issue has been moved here from https://github.com/FluxML/Flux.jl/issues/2261)

I have a somewhat complicated training setup and have recently started encountering CUDA-out-of-memory issues which only show up after a number of epochs.

I have managed to construct a minimum working example here:
```julia
using Flux
using FastAI
using MLUtils
using FastAI.FluxTraining

function main()
    DEVICE = gpu
    model = Chain(Dense(32*32*3=>2048), Dense(2048=>6), Dense(6, 32*32*3))

    make_data_sample_test(i) = (rand(Float32, 32*32*3),
                                rand(Float32, 32*32*3))
    data = mapobs(make_data_sample_test, 1:1024)
    dl     = DataLoader(data; batchsize=32, collate=true)

    loss = Flux.Losses.logitbinarycrossentropy
    opt = Flux.Adam(3e-4)
    learner = FastAI.Learner(model, loss;
                             optimizer=opt,
                             data=(dl, dl_val),
                             callbacks=[FluxTraining.ToGPU(), ])

    for _ in 1:5
      FluxTraining.epoch!(learner, FluxTraining.TrainingPhase())
      @show length(opt.state)
    end
end
```

After about 50 epochs (~1 minute on my laptop), I get an error that CUDA cannot allocate any more memory.
This seems to be because in the optimizer, the state variable accumulates GPU Arrays over time.

The issue can be fixed by replacing `opt = Flux.Adam()` with `opt = Optimizers.Adam()`. However, I think we should fix the problem for the Flux optimizer, since it seems to be "officially" supported.

@DrChainsaw has suggested in the other issue that the problem is that the `ToDevice` callback is not applied to the optimizer parameters. However I haven't looked at the specifics, and how one would fix that. Any insights?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

CUDA memory leak for Flux.Optimizer #148

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

CUDA memory leak for Flux.Optimizer #148

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions