I've run an Openstack cloud. Local to the host NVME's directly attached to VMs i...

yencabulator · 2026-04-25T16:02:44 1777132964

> There's not enough redundancy. You could raid1 those NVME's when before they get attached to a VM and that helps with hardware failures, but you get less of them to attach. Even if you RAID them, there's not a good way to move that VM to another host if there's a RAM or CPU or other hardware issue on that host.

The trick is building a block storage system that treats the local disk as write-back cache with async replication to networked storage. Like the blog post says they'll be doing.

The async replication has some integrity/recovery concerns for sure, but it the trick that enables local speeds. And people have been happy with async replication for their database for a very long time. Just need good observability for the durability delay.

Once you have that, you can do live VM migration if you're careful enough about dirty data. The new node just starts out with an empty cache.

It's not exactly trivial, but it's also probably not the biggest challenge if you're genuinely building a brand new cloud and going to compete against the hyperscalers. (Hell, hire me and I can write it for you. It'll take time and CPU hours to get stable, but the magic required is only mildly arcane.)

For example: https://dl.acm.org/doi/10.1145/3492321.3524271

solatic · 2026-04-24T06:13:42 1777011222

> Even if you RAID them, there's not a good way to move that VM to another host if there's a RAM or CPU or other hardware issue on that host.

This is the critical point. All hardware fails eventually. The CPU and RAM are, in a real sense, also ephemeral. The only relevant question is what the risk tolerance of the use-case is. If restoring from async backup is sufficient, then embrace ephemerality and keep backups. If you need round-the-clock availability, pick an architecture that lets you fall over gracefully to another machine, and embrace the ephemerality when you inevitably need to do so.

chatmasta · 2026-04-23T23:45:26 1776987926

> There's not enough redundancy

So build resiliency into your application layer.

sshine · 2026-04-24T19:17:21 1777058241

You don't always choose the application layer.

When you're an OpenStack cloud provider, your customers choose.

When you're a customer using Open Source software, your vendors choose.

Using a mixture of directly attached NVMe and network-attached volumes with backup is the sweet spot for me.

I don't need to maintain my own network filesystem (Ceph), and I can put applications that mirrors its database natively on NVMe and everything I don't have much control over on network-attached volumes.

I feel like there's something better not yet made.