|
|
|
# Structure of the Nvidia DGX A100
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
# Affinity
|
|
|
|
|
|
|
|
For the highest performance of every container, it is necessary to pay attention to the internal architecture of the node.
|
|
|
|
A number of dependencies such as the selected CPU cores, the network adapter, and the memory used must be considered.
|
|
|
|
|
|
|
|
### CPU Device Affinity
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
nvidia-smi topo -mp
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
|
|
|
|
| GPU | CPU | NUMA |
|
|
|
|
| ------ | ------ | ------ |
|
|
|
|
|GPU0| 48-63,176-191 |3|
|
|
|
|
|GPU1| 48-63,176-191 |3|
|
|
|
|
|GPU2| 16-31,144-159 |1|
|
|
|
|
|GPU3| 16-31,144-159 |1|
|
|
|
|
|GPU4| 112-127,240-255 |7|
|
|
|
|
|GPU5| 112-127,240-255 |7|
|
|
|
|
|GPU6| 80-95,208-223 |5|
|
|
|
|
|GPU7| 80-95,208-223 |5|
|
|
|
|
|
|
|
|
### Network Device Affinity
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
nvidia-smi topo -m
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
| GPU | Network Card|
|
|
|
|
| ------ | ------ |
|
|
|
|
|GPU0| mlx5_0, mlx5_1 |
|
|
|
|
|GPU1| mlx5_0, mlx5_1 |
|
|
|
|
|GPU2| mlx5_2, mlx5_3 |
|
|
|
|
|GPU3| mlx5_2, mlx5_3 |
|
|
|
|
|GPU4| mlx5_4, mlx5_5 |
|
|
|
|
|GPU5| mlx5_4, mlx5_5 |
|
|
|
|
|GPU6| mlx5_6, mlx5_7 |
|
|
|
|
|GPU7| mlx5_6, mlx5_7 |
|
|
|
|
|
|
|
|
### Storage Device Affinity
|
|
|
|
|
|
|
|
https://docs.nvidia.com/gpudirect-storage/configuration-guide/index.html |
|
|
|
\ No newline at end of file |