For years the fastest storage in a server lived inside that server, bolted to its PCIe bus, stranded the moment the box was busy or idle. NVMe over Fabrics breaks that link. It lets a pool of flash sit in its own enclosure and present drives to many hosts across a network at something very close to local latency, so capacity and performance can be allocated where the work actually is rather than where the drives happen to be bolted. For UK teams building dense virtualisation, analytics or AI estates, disaggregating flash is one of the more consequential architecture shifts of the decade, and it is worth understanding before you buy your next tray of NVMe.
What NVMe-oF actually does
NVMe-oF carries the NVMe command set over a network instead of over the local PCIe bus, so a remote drive behaves to the operating system almost exactly like a local one. The protocol was designed to preserve the low queue-depth, low-overhead character of NVMe, which is why a well-built fabric adds only single-digit microseconds of latency over a direct attachment rather than the milliseconds a traditional SAN block protocol would impose.
The practical effect is disaggregation: flash lives in a target enclosure, hosts become initiators, and the network in between is fast and lossless enough that the distance stops mattering. You stop sizing every server for its peak storage need and instead size a shared flash pool once, then carve it up. That is the same logic that made shared storage attractive in the first place, except now it is fast enough for workloads that previously demanded local NVMe.
Three transports: RoCE, TCP and Fibre Channel
There are three mainstream ways to carry NVMe-oF, and the choice shapes cost, latency and operational skill. RoCE (RDMA over Converged Ethernet) gives the lowest latency by letting the NIC move data directly into host memory, but it needs a carefully configured lossless Ethernet fabric with priority flow control, which is real network engineering. It is the choice when latency is the whole point.
NVMe/TCP runs over ordinary Ethernet with no special fabric tuning, which makes it by far the easiest to deploy on hardware you already own. It costs a little more latency than RoCE but is good enough for a large share of workloads, and it has become the pragmatic default for teams who want disaggregation without rebuilding their network. NVMe over Fibre Channel suits shops with an existing FC SAN investment and the discipline that goes with it, reusing the fabric they already trust.
- •RoCE: lowest latency, needs a lossless tuned Ethernet fabric and the skills to run it
- •NVMe/TCP: runs on standard Ethernet, easiest to adopt, slightly higher latency
- •NVMe/FC: reuses an existing Fibre Channel SAN and its operational discipline
- •Pick the transport from your latency target and the network skills you actually have
The hardware behind a target
A disaggregated flash target is, at heart, a dense NVMe enclosure with fast network ports and enough controller capability to serve the drives without becoming the bottleneck. That means high-lane-count PCIe to fan out to many drives, dual high-speed NICs sized to the aggregate drive bandwidth, and a controller or HBA path chosen for pass-through rather than legacy RAID, since the resilience usually lives in the storage software above. Our host bus adapters range covers the controller side of that build.
Drive selection matters as much as the enclosure. A target that serves many initiators sees a blended, often write-heavy profile, so endurance class and consistent latency under load matter more than headline sequential numbers. We size enclosures, fabrics and media together rather than in isolation; the flash itself comes from our SSD and NVMe range, matched to the read/write mix the pool will actually carry.
Where disaggregation pays and where it does not
Disaggregation pays when storage need and compute need grow at different rates, when you want to drive utilisation up by sharing an expensive flash pool across many hosts, or when diskless or thin-provisioned hosts simplify your fleet. Analytics clusters, large virtualisation estates and AI pipelines that must keep accelerators fed are natural fits, and the pattern sits comfortably alongside dense storage platforms such as those we build on HPE Apollo.
It does not pay when a workload is small, self-contained and happy with the local NVMe it already has; adding a fabric there is complexity for no gain. It also does not pay if you cannot commit to running the network properly, because a poorly tuned lossless fabric will undo the latency advantage that justified the whole exercise. Disaggregation is a deliberate architecture choice, not a default upgrade.
Resilience and the network as a dependency
When flash leaves the server, the network becomes a storage dependency, which changes how you think about resilience. Dual fabrics, multipath from initiator to target, and redundant NICs and ports stop being nice-to-haves and become the difference between a degraded path and an outage. The target enclosure itself needs the usual dual power and redundant components, but the bigger shift is treating the fabric with the same seriousness you would treat a SAN.
Right-sized, that dependency is a feature rather than a risk: multipath and a healthy fabric let you lose a NIC, a switch or a cable and keep serving. The design work is in sizing the fabric for the aggregate bandwidth of the pool plus headroom, and in making sure congestion control is configured so one noisy host cannot starve the rest. We design both the enclosure and the fabric in our server configuration service.
Putting it together
If you are weighing disaggregated flash, start from the workload and the network, not the enclosure. Decide whether your latency target needs RoCE or whether NVMe/TCP on the Ethernet you already run is enough, size the target for aggregate bandwidth and a realistic write mix, and build the fabric for redundancy from the outset. For the upstream architecture question of whether to grow by adding nodes or scaling a single system, read scale-out vs scale-up storage, and for the media-economics side of filling these enclosures see HDD vs QLC vs TLC tiering.