Skip to content

Specifications

Compute

Field	Value
Superchip	NVIDIA GB10 Grace Blackwell
CPU	20-core Arm: 10× Cortex-X925 + 10× Cortex-A725 (MediaTek IP)
GPU architecture	Blackwell
CUDA cores	6,144
Tensor Cores	5th generation
RT Cores	yes
Compute capability	`sm_121`
Peak AI performance	up to 1 petaFLOP FP4 (theoretical, sparsity enabled)

Memory and storage

Field	Value
System memory	128 GB LPDDR5X
Memory model	unified and coherent across CPU and GPU
CPU↔GPU interconnect	NVLink-C2C, ~6× PCIe Gen5 bandwidth
Storage	4 TB NVMe

Networking

Field	Value
High-speed	dual ConnectX-7 200GbE, QSFP56, ethernet-only
Management	1GbE
Wireless	Wi-Fi
Approved CX-7 cables	Amphenol NJAAKK-N911, NJAAKK0006 (0.5 m); Luxshare LMTQF022-SD-R

Platform

Field	Value
Operating system	DGX OS (Ubuntu base)
Architecture	ARM64 / aarch64
Container runtime	NVIDIA Container Runtime for Docker

Model capacity

Configuration	Unified memory	Approx. model size
Single Spark	128 GB	up to ~200B params
Two Sparks (200GbE)	256 GB	up to ~405B params
Quad stack	512 GB	largest open-weight models; 4,000 TFLOPS FP4

Bytes-per-parameter quick reference

Precision	Bytes/param	Example: 70B weights
FP16	2	~140 GB
FP8 / 8-bit	1	~70 GB
FP4 / 4-bit	0.5	~35 GB

KV cache is additional and grows with context length and batch size. Budget for it on top of the weights.