Quick Overview and Comparison
- Software --> Hardware
- fabric manager --> NVSwitch
- CUDA --> NVLink
Note this is simply a quick overview and comparison, and could be not so precise or close to facts. For more details, please refer to the following sections.
What is nVIDIA Fabric Manager?
It's a manager for managing nVSwitch and NVLink. It's a daemon running in the background. It's a part of nVIDIA driver optionally. Only when your hardware supports nVSwitch, installing fabricmanager would be useful.
Quick Overview of NVSwitch and NVLink
- NVLink: GPU vs. GPU, or GPU vs. CPU via PCI express
- NVSwitch: GPU vs. GPU with fully-connected GPU topology at full NVLink speeds (note that this is not RMDA[1]).
[1] Note that DGX-1 has no NVSwitch, but DGX-1 supports NVIDIA GPUDirect Remote Direct Memory Access (RDMA) already. That is to say, RDMA is not on top of NVSwitch. See White Paper: NVIDIA DGX-1 With Tesla V100 System Architecture; The Fastest Platform for Deep Learning
Difference between nVSwitch and NVLink
A rough overall picture is that: - NVSwitch is a hardware switch for connecting GPUs. It's a hardware component and managed by fabric manager. - NVLink is a hardware interconnect for connecting GPUs. It's a hardware component and driven by CUDA.
NVSwitch is a switch architecture designed to further enhance the capabilities of NVLink.