Frequently Asked Questions

Q: I want to run operations on the GPU, but Storch seems to hang?

Depending on your hardware, the CUDA version and capability settings, CUDA might need to do just-in-time compilation of your kernels, which can take a few minutes. The result is cached, so it should load faster on subsequent runs.

If you're unsure, you can watch the size of the cache:

watch -d du -sm ~/.nv/ComputeCache

If it's still growing, it's very likely that CUDA is doing just-in-time compilation.

You can also increase the cache size to up to 4GB, to avoid recomputation:

export CUDA_CACHE_MAXSIZE=4294967296

Q: What about GPU support on my Mac?

Recent PyTorch versions provide a new backend based on Apple’s Metal Performance Shaders (MPS). The MPS backend enables GPU-accelerated training on the M1/M2 architecture. While we have an ARM build of PyTorch in JavaCPP as of version 1.5.10, MPS ist not enabled as the CI runners currently run on a macOS version that is too old. If you want to help getting this to work, check out the corresponding issue.