Say that you have two paths in order to move data from GPU to CPU in a particular system:
We could have one converter that uses compression to move the bytes to cpu. This would reduce pressure on the bus between the cpu and the gpu but increase memory pressure temporarily on the GPU to perform the compression itself.
We could have another that just copies the bytes out of gpu into CPU without needing to make any temporary allocations.
Having a system that can understand which of these approaches is the most frugal with respect to memory consumption or some other resource constraint could be interesting form a performance and resiliency perspective