linux GPU相关库多版本管理
目录
注意
本文最后更新于 2024-12-04,文中内容可能已过时。
https://www.cnblogs.com/sddai/p/10278005.html
1 GPU驱动
- 驱动下载地址 https://www.nvidia.com/en-us/drivers/ 对于4080, Driver Version:550.127.05 Release Date:Tue Oct 22, 2024
- 查看版本。
$ nvidia-smi
Fri Nov 15 13:02:22 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05 Driver Version: 550.127.05 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4080 Off | 00000000:17:00.0 Off | N/A |
| 50% 31C P0 55W / 320W | 1MiB / 16376MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 4080 Off | 00000000:B1:00.0 Off | N/A |
| 0% 28C P0 47W / 320W | 1MiB / 16376MiB | 2% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
- 其他
2 CUDA
- 最新版本下载地址 https://developer.nvidia.com/cuda-downloads 例如我选的是 https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=20.04&target_type=runfile_local 。从 https://developer.nvidia.com/cuda-toolkit-archive 可以下载历史版本。
- 安装完成的提示。
$ sudo bash ./cuda_12.6.2_560.35.03_linux.run
[sudo] password for rui:
===========
= Summary =
===========
Driver: Installed
Toolkit: Installed in /usr/local/cuda-12.6/
Please make sure that
- PATH includes /usr/local/cuda-12.6/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-12.6/lib64, or, add /usr/local/cuda-12.6/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.6/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall
Logfile is /var/log/cuda-installer.log
- 查看cuda版本。 使用命令行查看
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
$ ./nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.
$ /usr/local/cuda-12.6/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Sep_12_02:18:05_PDT_2024
Cuda compilation tools, release 12.6, V12.6.77
Build cuda_12.6.r12.6/compiler.34841621_0
import torch
print(torch.__version__) ## 查看torch当前版本号
print(torch.version.cuda) ## 编译当前版本的torch使用的cuda版本号
print(torch.cuda.is_available()) ## 查看当前cuda是否可用于当前版本的Torch,如果输出True,则表示可用
- 多版本共存。安装好你想要的cuda版本以后,在你的~/.bashrc或者~/.zshrc里写入下面这些。
CUDA_HOME=/usr/local/cuda-12.6 ## 这里写你想要的cuda的版本。
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
3 CUDNN
4 python cuda torch
5 docker cuda
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html## installing-the-nvidia-container-toolkit
- 安装。
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's## deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo sed -i -e '/experimental/ s/^## //g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
- 配置
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
- 验证
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
- 结束。