First boot to first model

This tutorial takes you from a freshly delivered DGX Spark to a local language model answering prompts in a browser. Set aside about 45 minutes. You will not need to touch the command line until the very end, and even then only to confirm things worked.

By the end you will have booted the box, reached it from another computer on your network, and chatted with a model running entirely on the Spark.

Before you start

You need the Spark, its power supply, and one of the following two ways to drive the first-time setup: either a monitor, keyboard, and mouse plugged into the Spark directly, or another computer on the same network to reach it over SSH. Pick whichever is easier. The access method only matters for setup; afterwards you can reach the box however you like.

Boot it

Attach your peripherals: display, keyboard, mouse, and an ethernet cable to your network (the small 1GbE management port, not the big QSFP ports).
Connect the power supply. The Spark powers on by itself and walks you into the first-time setup utility.
Follow the setup prompts: accept the license, create your user account, set a hostname, and join your network. This is a normal Ubuntu-style first-boot flow because DGX OS is Ubuntu underneath.

Reach it from your laptop

Once setup finishes you will usually want to work from your normal machine rather than at the Spark itself. NVIDIA ships a tool called NVIDIA Sync that helps configure SSH access over your local network. The 5-minute “Set Up Local Network Access” playbook walks through it.

When it is configured, confirm you can reach the box:

ssh your-user@your-spark-hostname.local

A successful login means everything from here can be done remotely.

Run your first model

The friendliest first workload is Open WebUI with Ollama: Ollama pulls and serves models, and Open WebUI gives you a ChatGPT-style browser interface in front of it. NVIDIA’s 15-minute playbook is the canonical walkthrough, but the shape is:

Install Ollama on the Spark (it has native ARM64 + CUDA support).
Pull a model sized to fit comfortably in 128 GB. A good first pull is a mid-size reasoning model; you have room for far more than a typical workstation.
Terminal window
```
ollama pull qwen3:32b
```
Start Open WebUI (the playbook runs it as a container) and open it in your browser.
Pick your model in the UI and send a prompt.

When the model streams a reply back, you are running inference entirely on the Spark, no cloud involved.

Where to go next

You have a working box. Two good directions from here:

Serve a local LLM Expose an OpenAI-compatible API so your own apps and tools can call models hosted on the Spark.

Hardware specifications The lookup sheet for exactly what this hardware can hold, with the bytes-per-parameter math.