Reporter of #15526 here. I've still only ever seen this corruption in files being built, which seems to be a situation that the race can be won repeatably by some packages. I reviewed all my user data and I've found no corruption to it, but my workloads don't involve block clones.
Coming back home and checking 2.2.1 out zfs instantly started spewing write & checksum errors due to #15533. Both this and #15526 seem to have underlying issues from <2.2 that are just more easily triggered now.
First one also confirmed on FreeBSD now.
Holding off on 2.2 seems recommended, and if you're keeping critical data on OpenZFS it might be a good idea to give the issues a glance. The 2nd one might have the same underlying solution as an issue that has given me system freezes when closing in on ~90% pool usage using 2.1 on top of LUKS.
I was glad for this feature to land, but being conservative with my main storage array, I decided not to immediately jump on this. Prudence pays off I guess.
That said, for my main workstation, I plan to migrate to bcachefs very quickly once it's mainlined. I haven't done enough introspection to be able to tell you why I can hold both opinions in my head so well.
Kent does plenty of things right during the development process, but having a decade of experience contributing to OpenZFS/ZFSOnLinux, I can say that I will be very surprised if everything goes as well as people seem to think. There are many bugs in a complex code base and a simple 10x increase in userbase will undoubtably result in more reports of people hitting them.
That said, I think he is laying a good foundation (from what I have heard/read about his development process), but there are many times when the best of us have been confident in code that turned out to have problems and I doubt he is an exception to this.
It is mostly from external observation. He seems to be the reason that bcachefs isn't being shipped in production today, as he is trying to work through all of his backlog rather than shipping it in a WIP state. This is very different than how developers of a certain filesystem in Linux's kernel source tree that I shall not name do things. If I recall correclty, a developer of that filesystem told me several years ago that if they did not ship the code for users to use it, they would not find bugs. That file system is not much less buggy today than it was back then. :/
So the bcachefs is built on the basics of bcache which existed on its own as a block layer and worked in production for many years. This is layering that doesn't really exist in other filesystems. Not sure if that's what the parent meant.
That is not quite what I meant. Not shipping premature code that you know needs more work is something to be admired. It is something I wish more developers would do.
Anyway, it has a while since I read anything about bcachefs, but what I have read stuck me as being consistent with doing things well. For example, he is working on having an automated test suite in place before he ships it, which is a great thing to see:
> That said, I think he is laying a good foundation (from what I have heard/read about his development process), but there are many times when the best of us have been confident in code that turned out to have problems and I doubt he is an exception to this.
For mixed storages, bcachefs will be interesting.
For RAID1, another option is any filesystem over dm-integrity over mdadm: dm-integrity can protect against silent file corruption if used below the mdadm level: any filesystem reading inconsistent data + checksum at dm-integrity level would cause dm-integrity to give a EILSEQ to mdadm, which should recover data from the mirrors.
1. it's a "bring your own filesystem" (ex: XFS, EXT4): you add protection against bitrot to any filesystem
2. dm-integrity may be newer, but mdadm and XFS have a large user base, making them well tested.
3. you can test this approach by simulating bitflips on the underlying data device, reading from the /dev/mapper entry, reading files themselves, doing a scrub etc.
4. you can select other algorithms besides crc32: in the rare case the error couldn't be seen by crc32 (which is likely to be applied at the hardware level) you gain an extra layer of safety
The level of simplicity of ZFS has no equivalent in Linux; the whole lvm stack is a fuckup from a usability perspective. Half a dozen distinct commands working on different layers that - more often than not - are conceptual. ZFS has been around for almost 20 years now. It was production ready more than 10 years ago, when it was open-sourced. Having your systems 1 or 2 versions behind the mainline is good practice in every mission-critical piece of software.
> Having your systems 1 or 2 versions behind the mainline is good practice in every mission-critical piece of software.
I do that, and I've even personally experienced a few rare ZFS bugs that seem due to interactions between ZFS and Western Digital firmwares.
Still, I was caught unprepared: I have several backups not stored on ZFS, but all of them were made FROM a ZFS source, meaning they are now all suspicious since silent corruption has been possible since version 2.1.4, and maybe even longer than that.
ZFS is practical to use, but for now I think I'll keep a history of file checksums, like how it was done before bitrot protection.
Still, that could happen with any other FS. Silent corruption is actually quite more common than winning the lottery, so in that regard ZFS is actually a good step on reducing those odds, even if its not zero (like advertised in the package).
I'm also assuming those backups aren't actually ZFS streams (from zfs send|receive) which is a special case of "bugs biting you twice" :P
I just benefited from ZFS boot environments/snaphsots during the update of FreeBSD from 13.2 to 14.0-release. It's a great tool to have at your disposal
I had the exact same reaction. Saw the massive list of new features and decided to wait for .1 or .2 releases to shake out the bugs. Seems like patience is paying off as always.
I've heard and experienced plenty of problems with BTRFS, but it's pretty rare that I hear of problems with ZFS (aside the the current article, obviously); what problems have you hit? (As a very heavy user of ZFS, I would very much like to know about possible problems before they bite me)
I have two identical Debian 11 on ZFS servers with ZFS encryption enabled. On one pool on one machine it will start reporting errors after about 1 week of uptime. The errors are caused by failure to decrypt within the Kernel Cryptography Framework. No read/write/cksum errors are reported for any of the zpool devices. Running a scrub finds no errors and clears the errors. The presence of the errors causes my snapshot replication to fail so I need to reboot the server weekly and run a scrub. One person has reported that updating to 2.1.13 fixed the issue, IIRC. That version hasn't been released to Debian 11 yet and that version also included a commit which removed the KCF kstat code so I wouldn't be able to monitor the KCF errors anymore.
Another issue I ran into is when SQLite is run in synchronous mode (the default) with WAL (not default but recommended) and the default locking mode (so it creates an -shm file) and the shm file is stored on a ZFS filesystem. SQLite frequently executes ftruncate on the shm file when different processes access the same SQLite database, and for some reason ZFS can cause the ftruncate call to block until the txg timeout which is usually 5 seconds [0]. If you were running a program which records every shell command you run to a SQLite database, for example, that would cause a 5 second hang before any shell command would execute [1]. The workaround is to disable synchronous in SQLite or the ZFS dataset, which is probably safe because of the ZFS atomicity guarantees.
Those are two examples I have run into recently. I'm sure people run into issues with e.g. ext4 as well but I think they are a bit more frequent with ZFS on Linux, especially if you use more fringe features like encryption.
Unfortunately, encryption never was quite as robust as the rest of ZFS. That said, things have become much better in the past year, as there have been targetted efforts focused on improving it, which found and fixed multiple bugs.
That said, most of the developers are likely not using it on their own systems, which probably allowed bugs to stick around abnormally long compared to the rest of the code base (although some really ancient bugs were found last year in other areas).
Since you asked me, it was many years ago now (5+) - but it hurt. I lost a 16TB cluster to btrfs, the raid just failed for no apparent reason (this wasn't uncommon back then according to forums). And on ZFS it was a kernel bug where on boot, it would mount ZFS after waiting for root - so root never appeared. (This was due to ZFS running as a user-space driver on linux back when it wasn't yet in the kernel)
You can find many corruption issues on the openzfs bug tracker. The btrfs corruptions are a bit of a meme now and everyone has their own story from (old system, unsupported feature) failure long time ago, so it's quite hard to really compare how stable they are in realistic scenarios.
This wasn’t the only block cloning bug. They also completely broke cp from unencrypted to encrypted by cloning unencrypted blocks into the encrypted file system.
I was excited to try zfs on a separate disk with Ubuntu, until I screwed up my windows boot... guess sit will be a while before I get to try it while I fix my mess.
ZFS is undeniably impressive – I first started using it back in 2009 with OpenSolaris. This year, motivated mainly by the power consumption of my previous OmniOS setup, I made the move to the "dark side" with Synology. After more than a decade of relying on ZFS, I must admit, the transition to something less rigid but still robust has been quite refreshing.
It could be a subjective feeling, but the recent years of OpenZFS development have reminded me a bit of the OpenStack experience. It seemed like almost anyone could contribute, sometimes resulting in features that were questionable in terms of stability, development, and overall thoughtfulness. Perhaps this is why iXsystems has taken a more cautious (albeit slower) approach in enabling new features in TrueNAS.
What I liked about solaris--and illumos based OS's--is the time slider built in to the file manager. Would live to see something like that in freebsd and/or linux
How the heck did this ever make it through? I don't think this ever would have happened pre-OpenZFS, and undermines the stable reputation of ZFS. OpenZFS needs to do better and look to FreeBSD developers, not Linux developers, as role models.
>This is often regarded as a good thing in the BSD world. In all fairness, this bug just enforces that notion.
Same in Linux, which is why you'd choose a distro that's less bleeding edge.
>And Solaris (at least on amd64) wasn't slow.
No, it was slow. There's a reason why everyone in HPC/HFT/etc. moved off Solaris to Linux in the 2000s. Linux was regularly beating Slowlaris in practically every category at the end.
I'm reading through the bug report and the investigation in the comments. Is there something here that makes this bug extremely obvious to you that nobody else is comprehending?
It seems like a meme that GPL software developers are more ego driven than BSD/MIT software developers, and so bike shedding and new features take precedence over correctness/simplicity/beauty. I was a decade long Gentoo user who was frustrated by crashy software, switched back to pirated windows (I would never be caught dead paying for windows), only to find out the same trash developer habits carried over in my absence and now windows sucks just as much plus IT SPIES ON YOU!!! I'm now on Qubes OS hoping Xen is well enough built to assuage my anxieties.
The Linux developers involved use the CDDL. While some of them have patches in Linus’ tree, most of them are described as Linux developers because they develop and use the code on Linux. They have little to do with mainline Linux, partly because certain the mainline Linux developers will actively go out of their way to antagonize them if they even try to contribute to mainline. Certain others will simply ignore their emails. I will not say names, but I have had that happen to me in the past.
That said, I suspect you did not read the replies, since the code in question was written by a FreeBSD developer. Bugs in new code have been introduced by developers on both platforms. Unfortunately, the bugs in this feature were not caught before they reached a stable release. :/