| Field | Value |
|---|
| Superchip | NVIDIA GB10 Grace Blackwell |
| CPU | 20-core Arm: 10× Cortex-X925 + 10× Cortex-A725 (MediaTek IP) |
| GPU architecture | Blackwell |
| CUDA cores | 6,144 |
| Tensor Cores | 5th generation |
| RT Cores | yes |
| Compute capability | sm_121 |
| Peak AI performance | up to 1 petaFLOP FP4 (theoretical, sparsity enabled) |
| Field | Value |
|---|
| System memory | 128 GB LPDDR5X |
| Memory model | unified and coherent across CPU and GPU |
| CPU↔GPU interconnect | NVLink-C2C, ~6× PCIe Gen5 bandwidth |
| Storage | 4 TB NVMe |
| Field | Value |
|---|
| High-speed | dual ConnectX-7 200GbE, QSFP56, ethernet-only |
| Management | 1GbE |
| Wireless | Wi-Fi |
| Approved CX-7 cables | Amphenol NJAAKK-N911, NJAAKK0006 (0.5 m); Luxshare LMTQF022-SD-R |
| Field | Value |
|---|
| Operating system | DGX OS (Ubuntu base) |
| Architecture | ARM64 / aarch64 |
| Container runtime | NVIDIA Container Runtime for Docker |
| Configuration | Unified memory | Approx. model size |
|---|
| Single Spark | 128 GB | up to ~200B params |
| Two Sparks (200GbE) | 256 GB | up to ~405B params |
| Quad stack | 512 GB | largest open-weight models; 4,000 TFLOPS FP4 |
| Precision | Bytes/param | Example: 70B weights |
|---|
| FP16 | 2 | ~140 GB |
| FP8 / 8-bit | 1 | ~70 GB |
| FP4 / 4-bit | 0.5 | ~35 GB |
KV cache is additional and grows with context length and batch size. Budget for it on top of the weights.