A petabyte used to be a data-centre's worth of storage; in 2026 it is a design exercise that fits in a few rack units if you get the architecture right. But assembling a petabyte is not simply a matter of buying enough disks, because the way you reach that capacity, the media you choose, the redundancy scheme and the path from drives to network all decide whether the result is fast, resilient and affordable or merely large. This is a reference architecture for building a one-petabyte storage server in the UK: how to get to a petabyte, the media economics, and the engineering that keeps it serviceable.
Two ways to reach a petabyte
There are broadly two routes to a petabyte. The first is density within a chassis: a storage-dense server that packs as many large drives as possible into a standard-depth box, reaching enormous capacity in a small footprint. The second is expansion: a head node attached to one or more JBOD shelves that add drive bays without adding servers, daisy-chained over SAS so the capacity grows beyond what a single chassis holds.
The two combine in practice. A dense head node, such as a member of the HPE Apollo family, provides the compute and a large first tranche of drives, and JBOD expansion shelves extend it to a petabyte and beyond. The choice of how much to put in the head node versus the shelves is an early architectural decision that affects cost, serviceability and how the capacity scales, so it is worth making deliberately rather than by accident.
Media economics: HDD, hybrid or all-QLC
The single biggest lever on the cost and character of a petabyte is the media. High-capacity hard drives remain the cheapest path to bulk capacity and are the right answer for archival, backup and capacity-led data where throughput matters more than latency. An all-flash petabyte, typically built on dense QLC NAND, is far faster and denser but more expensive per terabyte, and is justified where the workload genuinely needs the performance across the whole dataset.
Most petabyte designs land in between, as a hybrid: a flash tier for the hot, latency-sensitive data and the metadata, with the bulk of the capacity on hard drives. This captures most of the performance benefit where it is felt while keeping the cost per usable terabyte close to the HDD floor for the cold majority. The right media mix is a workload decision, and getting it right is where the economics of a petabyte are won or lost.
- •HDD: cheapest bulk capacity, best for archive, backup and capacity-led data
- •All-QLC flash: fastest and densest, justified when the whole dataset needs speed
- •Hybrid: flash for hot data and metadata, HDD for the cold bulk - usually the sweet spot
- •Match the media mix to how the dataset is actually accessed
Redundancy at petabyte scale
Redundancy choices that are fine on a small pool become critical at a petabyte, because the drives are large and rebuilds are correspondingly slow and exposed. Simple single-parity schemes are risky across multi-terabyte drives, since a second failure during a long rebuild can lose data, so petabyte designs lean on double-parity or erasure coding that can tolerate more than one concurrent failure across the set.
Erasure coding, in particular, is the workhorse of large-capacity systems because it delivers resilience with far less capacity overhead than full mirroring, which matters enormously when you are protecting a petabyte. The trade is more compute to encode and decode and longer rebuilds, so the redundancy scheme, the drive size and the recovery behaviour have to be designed together. This is the same rebuild-risk discipline that drives drive-size choices throughout dense storage.
The data path: drives to network
A petabyte of drives is useless if the path to them is a bottleneck, so the engineering between disk and network matters as much as the capacity. Drives connect through SAS expanders and host bus adapters in pass-through mode to the storage software, which manages the pool and presents it; the HBA and expander layout has to be sized so it is not the choke point for the aggregate throughput of all those drives.
At the front, the network has to be able to carry the bandwidth the workload demands, which for a petabyte often means multiple high-speed links, 25, 100GbE or more, sized to the use case rather than the capacity. A large backup target needs steady throughput; a media or analytics tier needs far more. Get the internal path and the external network in proportion to the drives, or the petabyte will feel slow regardless of how much flash you bought.
Serviceability and the cost of downtime
At a petabyte, serviceability stops being a nicety and becomes part of the design, because the more drives you have, the more often one will fail, and the architecture has to make that a routine, non-disruptive event. Hot-swap drive bays, redundant power and cooling, and a pool that tolerates and recovers from drive loss without taking the system offline are what keep a petabyte running rather than lurching from outage to outage.
The boot device should sit on its own mirrored pair, separate from the data, so the operating environment is independent of the pool it serves. And because rebuilds at this scale take time, the design has to assume drives will fail during other drives' rebuilds and survive it. A petabyte built without this discipline is not a storage system, it is a large and fragile pile of disks. We engineer the resilience and serviceability together; see our Dell storage options for shared-array alternatives.
Putting it together
A one-petabyte server is a balanced design, not a disk count: reach the capacity through a dense head node plus JBOD expansion, choose a media mix, usually hybrid, that matches how the data is accessed, protect it with double-parity or erasure coding sized against rebuild risk, and proportion the HBA path and network to the aggregate throughput. Build in hot-swap serviceability and a separate mirrored boot so failures are routine. We design petabyte architectures around the HPE Apollo family and the right SSD and NVMe tiers.