OpenZFS boot pool snapshots may cause grub boot failure
After a recent reboot, a system running Debian Bookworm with Root on ZFS 2.2.2 and grub 2.06 left me unceremoniously at the grub (bootloader) prompt. Trying to access the boot partition (hd0,gpt3) in grub resulted in the error:
OpenZFS Bug #13873 explains the issue that snapshots of the top level of the boot pool can leave grub unable to deal with the compression type, though it's really grub bug #64297. The issue is known, but this was my first encounter following occasional boot pool snapshots. In my case it may have coincided with upgrading to ZFS 2.2.2 from bookworm-backports, which at least one other reporter believed was their experience too.
The issues involved here might prompt a useful reminder of the importance of:
So, reinstall and restore or try to recover?
The quickest and most straightforward option seemed to be to recover by recreating the boot pool and reinstalling grub.
The OpenZFS troubleshooting guide for Debian shows how to import (in this case both boot and root) pools using a LiveCD/USB, which succeeded, so data could have been backed-up or recovered at this stage if necessary, though that would have been far from ideal. The boot pool contents were copied to a FAT32-formatted USB stick. The system was able to boot from this using the previous kernel and the grub commands:
NB. however that storage device and partition names as well as subsequent paths may differ between systems, and even after reboots of the same system, depending on which storage devices are connected at the time.
Having booted from the USB stick containing bpool's contents, the latest kernel stalled at the initramfs screen (which I think is the end of grub's involvement). After rebooting from the previous kernel, bpool was destroyed and then recreated as in step 2.4 of the original instructions, and the contents of the USB stick copied into it.
The command:
allowed the system to boot from the new bpool using the previous kernel.
This allowed the latest kernel to be (re)installed, and a second
fixed a
error appearing on the initramfs screen, which caused an approx 10s delay in booting.
The hex number appeared several times in
There are recent reports of both success and failure with boot pool snapshots with grub 2.12, so the issue doesn't seem to be conclusively fixed at the time of writing.
ZFSBootMenu is recommended as a designed-for-ZFS alternative to grub by multiple users, though the lack of (at least) Debian packaging adds complexity and potential issues with timing of (especially automatic) updates.
I am grateful to contributors to the OpenZFS Mailing List for their advice with this matter. I had wondered if my use of ZFS native encryption might have been complicating matters, but apparently not.
It should be noted that native encryption is still somewhat buggy, and is not to be recommended for production use without extreme caution/conservatism. OpenZFS instructions (for Debian in this case) describe how to use LUKS encryption as an alternative, which offers established reliablity, but with some inevitable compromises.
compression algorithm inherit not supported
OpenZFS Bug #13873 explains the issue that snapshots of the top level of the boot pool can leave grub unable to deal with the compression type, though it's really grub bug #64297. The issue is known, but this was my first encounter following occasional boot pool snapshots. In my case it may have coincided with upgrading to ZFS 2.2.2 from bookworm-backports, which at least one other reporter believed was their experience too.
The issues involved here might prompt a useful reminder of the importance of:
- regular backups, tested so they can be restored from not making changes to disk structures without careful consideration and checking
So, reinstall and restore or try to recover?
The quickest and most straightforward option seemed to be to recover by recreating the boot pool and reinstalling grub.
The OpenZFS troubleshooting guide for Debian shows how to import (in this case both boot and root) pools using a LiveCD/USB, which succeeded, so data could have been backed-up or recovered at this stage if necessary, though that would have been far from ideal. The boot pool contents were copied to a FAT32-formatted USB stick. The system was able to boot from this using the previous kernel and the grub commands:
grub> linux (hd0)/vmlinuz-6.1.0-16-amd64 root=ZFS=rpool/ROOT/debian ro
grub> initrd (hd0)/initrd.img-6.1.0-16-amd64
grub> boot
NB. however that storage device and partition names as well as subsequent paths may differ between systems, and even after reboots of the same system, depending on which storage devices are connected at the time.
ls
and ls (hdX)/
at the grub prompt served to check.Having booted from the USB stick containing bpool's contents, the latest kernel stalled at the initramfs screen (which I think is the end of grub's involvement). After rebooting from the previous kernel, bpool was destroyed and then recreated as in step 2.4 of the original instructions, and the contents of the USB stick copied into it.
The command:
# grub-install /dev/sda
allowed the system to boot from the new bpool using the previous kernel.
This allowed the latest kernel to be (re)installed, and a second
# update-grub
fixed a
no such device: <long hex humber>
error appearing on the initramfs screen, which caused an approx 10s delay in booting.
The hex number appeared several times in
/boot/grub/grub.cfg
and was replaced there after update-grub
succeeded.update-grub
is supposed to run when a kernel is (re)installed, so there seems likely to have been an issue I didn't notice at that point, but all has been in order since.There are recent reports of both success and failure with boot pool snapshots with grub 2.12, so the issue doesn't seem to be conclusively fixed at the time of writing.
ZFSBootMenu is recommended as a designed-for-ZFS alternative to grub by multiple users, though the lack of (at least) Debian packaging adds complexity and potential issues with timing of (especially automatic) updates.
I am grateful to contributors to the OpenZFS Mailing List for their advice with this matter. I had wondered if my use of ZFS native encryption might have been complicating matters, but apparently not.
It should be noted that native encryption is still somewhat buggy, and is not to be recommended for production use without extreme caution/conservatism. OpenZFS instructions (for Debian in this case) describe how to use LUKS encryption as an alternative, which offers established reliablity, but with some inevitable compromises.
Follow This Blog