Ceph and other scale-out software-defined storage platforms turn ordinary servers into a single, resilient pool, but the cluster is only ever as good as the nodes underneath it. Get the per-node hardware recipe wrong and you end up with a cluster that is slow, lopsided or starved of memory under recovery. The HPE Apollo, with its high drive density in a conventional chassis, is a natural building block for these clusters. This is the hardware selection guide: how to spec an Apollo node for Ceph, the memory, flash, drives and networking that make the cluster behave, rather than the software architecture itself.
Why density makes Apollo a good SDS node
Scale-out SDS platforms grow by adding nodes that each contribute storage, so the more disks a node holds in a given amount of rack space, power and management overhead, the better the economics of the whole cluster. The HPE Apollo packs a large number of drives into a standard-depth chassis, which means fewer, denser nodes for a given petabyte and therefore fewer servers to power, cool and manage, the same density logic that runs through the wider HPE Apollo family.
Density alone is not the whole story, though, because a node that is all disk and no balance will bottleneck. A good SDS node pairs that drive density with enough CPU, memory, flash and network to keep every disk usefully busy and to survive recovery without falling over. The art of speccing an Apollo node for Ceph is balancing the drive count against those other resources, so the density is an asset rather than a liability under load.
Drives and the OSD model
In Ceph each drive typically becomes an OSD, an object storage daemon that owns that disk, so the drive count per node directly sets how many OSDs the node runs and how much it contributes to the pool. Large-capacity drives maximise raw petabytes, which is the point of a dense node, but they also lengthen recovery times, because rebuilding the data from a failed large drive across the cluster takes longer and stresses the survivors while it runs.
Present the drives to Ceph through a plain host bus adapter in pass-through mode, never a hardware RAID controller, because the software wants to own each disk natively and manage redundancy itself. The balance to strike is capacity against recovery: very large drives give the most petabytes per node but slower, heavier rebuilds, so the right drive size is a deliberate trade rather than simply the biggest available. Choose the exact drives with our SSD and NVMe guidance.
- •One drive usually equals one OSD; drive count sets the node's contribution
- •Large drives maximise capacity but lengthen and intensify recovery
- •Always present disks via HBA pass-through, never hardware RAID
- •Pick drive size as a capacity-versus-rebuild trade, not just the biggest
Memory: the resource Ceph nodes run short of
Memory is the resource SDS nodes most often run short of, because each OSD needs a working allowance of RAM, and that requirement multiplies by the number of drives in a dense node. A node packed with disks therefore needs substantial, ECC-protected memory simply to run its OSDs comfortably, and skimping here is a common cause of clusters that behave well until they are busy and then struggle.
Recovery is where the memory headroom proves itself. When a drive or node fails, the cluster rebuilds the missing data across the survivors, and that recovery work consumes additional CPU and memory on top of the normal serving load. A node sized only for steady state will be under-resourced exactly when it matters most, so size the memory for the OSD count plus recovery headroom, not for an idle day. Specify the right modules with our memory and RAM guidance.
Flash for WAL and metadata
A dense node built entirely from large capacity drives can be made markedly faster by adding a small amount of flash for the latency-sensitive parts of the storage engine. In Ceph the write-ahead log and metadata benefit from living on fast NVMe rather than on the slow capacity disks, so a node typically pairs its bulk drives with one or more high-endurance NVMe devices dedicated to that role.
This is the same hot-tier principle that applies to any storage server: put the small, frequently-touched, write-heavy data on flash and leave the bulk on cheaper, denser media. The flash devices carrying the write-ahead log are written hard, so they want high endurance, and they should be sized and counted so a single device does not become a bottleneck or a single point of failure for too many OSDs at once.
Networking and the minimum viable cluster
Networking is decisive for SDS, because the cluster constantly moves data between nodes for replication and recovery, and that traffic is in addition to the client load. A node wants fast, redundant links, with 25GbE a sensible baseline and higher where the node is dense or the cluster large, often split so client traffic and the cluster's internal replication traffic do not contend for the same bandwidth.
Scale-out also has a floor. A resilient Ceph cluster needs a minimum number of nodes for its replication or erasure-coding scheme to tolerate a node failure, typically at least three, so the smallest sensible cluster is several balanced Apollo nodes rather than one large one. Building below that floor gives you density without resilience, which defeats the purpose. We design the node and the minimum cluster together as part of our HPE Apollo work.
Putting it together
A good Apollo SDS node balances four things against its drive density: enough ECC memory for every OSD plus recovery headroom, a small high-endurance NVMe tier for the write-ahead log and metadata, drives sized as a capacity-versus-rebuild trade and presented through an HBA, and fast, redundant networking with client and replication traffic separated. Build at least three such nodes so the cluster can survive a failure. We size the per-node recipe and the cluster floor together; talk to us through our HPE Apollo practice.