Overview
If you are trying to use Docker and the CircleCI GPU executor, you may get the following error.
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]. ERRO[0407] error waiting for container: context canceled
This is due to the removal of the nvidia-container-toolkit when CircleCI Switched to using images with multiple CUDA versions available at runtime.
Solution
Step 1: Add a step in your config.yml to install `nvidia-container-toolkit` and Restart Docker
- run:
name: Install nvidia-container-toolkit and Restart Docker
command: |
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart dockerStep 2: Verify Compatibility
To ensure that Docker and GPUs are working together, you can run a test container using the nvidia/cuda:11.4.3-base-ubuntu20.04 image. This container will execute the nvidia-smi command to display GPU information.
- run:
name: Test GPU Docker
command: docker run --gpus all nvidia/cuda:11.4.3-base-ubuntu20.04 nvidia-smiIf the nvidia-container-toolkit is functioning correctly and Docker can utilize the GPU resources, you should see the GPU information displayed.