Building your first UK on-prem AI cluster: step-by-step

Cloud AI training is convenient but at sustained scale (multi-week training runs, hundreds of GPUs), on-prem becomes meaningfully cheaper. UK enterprises building first on-prem AI clusters face specific decisions on hardware, fabric, storage, software stack, and facility prep. This is the practitioner's playbook.

First UK AI cluster — reference architecture

When on-prem AI beats cloud

Sustained utilisation > 50% — cloud premium for elasticity stops paying for itself.

Sensitive data + sovereignty — particularly for FS, healthcare, defence.

Multi-week training runs — cloud spot pricing helps but on-prem is more predictable.

Model fine-tuning + iteration cycles — owned infrastructure removes per-experiment cost friction.

Step 1 — Sizing

4-8 GPU starting cluster: 1× Supermicro SYS-821GE (8× H100/H200) or NVIDIA DGX B200.

16-64 GPU production cluster: 2-8× 8-GPU servers + 800G fabric.

128+ GPU pre-training cluster: dedicated AI facility, liquid cooling, 800G AI fabric.

See our NVIDIA GPU roadmap for choice within sizing.

Step 2 — Fabric design

RoCEv2 over Ethernet (Arista 7060X6, Cisco Nexus 9332D-H2R, Juniper QFX5240) — most-common UK choice in 2026.

InfiniBand NDR / X800 (NVIDIA Quantum-2 / Quantum-X800) — alternative for pure HPC / lowest-latency training.

800G GPU-to-GPU bandwidth. 400G for storage. Separate management network.

Step 3 — Storage

Hot tier: Pure FlashBlade or VAST Data — high parallel throughput for training data loading.

Warm tier: NetApp ONTAP or Dell PowerScale.

Archive: AWS S3 / Azure Blob / local object storage.

AI cluster stack — what you procure together

Step 4 — Software stack

NVIDIA AI Enterprise + CUDA toolkit + cuDNN.

Container platform: NVIDIA NGC + Kubernetes + Kubeflow / Slurm for training orchestration.

Model registry: MLflow / Weights & Biases.

Frameworks: PyTorch + Hugging Face Transformers + DeepSpeed for distributed training.

Step 5 — Facility

Power: 1-MW+ for serious AI cluster. Pre-survey colo or upgrade on-site facility.

Cooling: D2C liquid cooling for B200+ density. Air cooling for H100.

Network connectivity: dedicated egress for data movement.

What Servnet does

Servnet runs UK enterprise AI infrastructure builds end-to-end. Engagement: 1) workload sizing (model + training pattern + concurrency), 2) sized commercial bid across DGX + Supermicro options, 3) facility pre-survey (power + cooling + space), 4) deployment + commissioning, 5) optional ongoing managed AI infrastructure service.

FAQs — Building your first UK on-prem AI cluster

Sizing

What's the minimum viable AI cluster?

1× Supermicro SYS-821GE (8× H100/H200) or 1× NVIDIA DGX B200 is a credible starting point for fine-tuning + inference workloads. Pre-training requires 16+ GPUs minimum to be useful.

How much does a starter cluster cost?

UK list pricing for 8-GPU H200 server: £180-280k. NVIDIA DGX B200: £350-500k. Add fabric switching, storage, racks, cabling, deployment: typically £100-200k. Total 8-GPU starter cluster: £300-700k.