Connect three Sparks (ring)
You have three DGX Sparks and want them to act as one machine without buying a switch. Three nodes is the sweet spot for switchless clustering: each Spark has two ConnectX-7 ports, so three of them wire into a ring where every node is directly cabled to the other two.
This was made official in the June 2026 update (NCCL 2.30u1 added three-node ring support), and the combined pool reaches 512 GB of unified memory for 400B-plus models.
What you need
Section titled “What you need”- Three DGX Sparks on DGX OS with NVIDIA drivers.
- Three approved QSFP cables (the same Amphenol/Luxshare cables from the two-Spark how-to).
sudoon all three, matching usernames across nodes (multi-node tooling and passwordless SSH assume it).
Cable the ring
Section titled “Cable the ring”Each Spark has two CX-7 cages. Call the one nearest the ethernet jack Port0 and the far one Port1. Wire Port0 of each node to Port1 of the next:
- Node1 Port0 → Node2 Port1
- Node2 Port0 → Node3 Port1
- Node3 Port0 → Node1 Port1
That closes the loop. Every port carries a full 200GbE link, and each physical port presents two logical RoCE interfaces (four per machine), which matters for bandwidth tuning (see multi-Spark networking).
Let the Cluster Assistant do the config Recommended
Section titled “Let the Cluster Assistant do the config ”The fastest path is the Cluster Assistant in NVIDIA Sync. Starting from devices already enrolled in Sync, it runs a guided workflow that handles the parts that are tedious to do by hand:
- system readiness checks (OTA version, sudo access)
- CX-7 topology detection (an LLDP/BPDU probe runs on each node in parallel)
- IP planning, deconfliction, and
netplanapplication - bandwidth and latency validation with
ib_write_bw/ib_write_lat - passwordless SSH between nodes, keyed over the CX-7 fabric
When it finishes, the three nodes have a configured RoCE network and node-to-node SSH, ready for your workload.
Build NCCL for the ring
Section titled “Build NCCL for the ring”The other recurring footgun is interface-name consistency. The ring example pins the fast interface to a different name than the two-Spark setup (enP7s7 vs enp1s0f1np1), and these three variables must be identical across all nodes or the collective test fails:
export NCCL_SOCKET_IFNAME=<your-cx7-iface>export UCX_NET_DEVICES=<your-cx7-iface>export OMPI_MCA_btl_tcp_if_include=<your-cx7-iface>Use ibdev2netdev on each node and confirm the name matches before running anything.
Verify
Section titled “Verify”Run an NCCL all_reduce across the three nodes. A healthy ring lands in the ~190 Gb/s class once you are driving both logical halves of each port (more on that in multi-Spark networking). From here the cluster is ready for distributed inference (vLLM/Ray, TensorRT-LLM) or distributed training.
Going to four
Section titled “Going to four”Four nodes generally needs a managed 200GbE RoCE switch, because you run out of ports before you can build a full mesh. That is a different setup with its own tradeoffs, covered in multi-Spark networking.