05:58:54 are smartos issues visible publicly? wondering about the OS-6216 (https://github.com/TritonDataCenter/illumos-joyent/commit/80caa2e90432a6929d577979f0d62a577d583b69) 06:43:19 yuripv: https://smartos.org/bugview/OS-6216 06:43:21 → OS-6216: VOP_ACCESS() use in sdev_readdir() leads to deadlock (Resolved) | https://github.com/joyent/illumos-joyent/commit/80caa2e90432a6929d577979f0d62a577d583b69 06:43:28 rmustacc: thank you! 10:09:43 yuripv: oh god the horror 10:47:28 jlevon`: why? :) 10:47:48 we seem to be hitting that issue intermittently 11:48:22 got ZFS crashes: https://pasteboard.co/1gedb0wwniJJ.png (after an update 2 days ago), https://pasteboard.co/aBtHf4N3xvsg.png (from possibly a month-old OI, mid-December'ish) 11:49:47 A pool scrub from linux rescue ISO (the only one DigitalOcean lets use) did not complain. I think the crashes may be triggered by znapzend activity, but can't vouch for that. 11:53:20 the later one is from OpenIndiana Hipster 2023.10 Version illumos-00f13cd38a 64-bit (per early boot prompt) 11:55:24 disabled znapzend and launched a scrub from OI... will see if it crashes or finds anything that illumos codebase dislikes... 12:24:00 so, illumos zpool scrub also passed well 12:24:16 did you try to import without zpool cache file? 12:24:39 i vaguely remember that i've seen mysterious crashes with solaris 11 zfs that were connected to zpool cache somehow 12:26:14 I think rpool goes imported without one - got nowhere to read it from yet? 12:26:31 ah ok, it goes on rpool. 12:26:43 (i did not check yout pastebins) 12:27:23 but FWIW, /etc/zfs/zpool.cache is updated today on the running system (probably during boot, roughly same age as uptime) 12:30:20 i was looking for screenshot from "my" crashes (i surely have some, but can't seem to find them) 12:35:26 yuripv: just sdev is a locking horrowshow 14:07:08 Checking that the fault was related to a running znapzend service - and indeed it seems to have been due to its clean-up of older snapshots (in manual run with debug), as seen in topmost line at https://pasteboard.co/OdhyUQgFbPoG.png . The `zpool scrub` is clean however... 14:09:10 Locking up during reboot (claimed in earlier discussions to be something between illumos kernel and QEMU running the VM) is annoying - gotta power-cycle the VM or suffer an outage of about an hour or two until it does reboot unattended. But to get such screenshots it is actually helpful :D 14:13:14 I'll try to clean that snapshot away manually, maybe with the DigitalOcean Linux recovery ISO, in hopes that it is the only such troublemaker 14:26:56 that was fun... I told DigitalOcean to turn off the vm, it said it did. I changed the boot device and turned it on - and the console still showed the ZFS stacktrace and attempt to reboot. Power-cycling helped, though... 14:34:47 so, linux zfs had no qualms dropping that snapshot chain 14:34:55 trying znapzend from OI again... 14:53:15 yeah, seems that one snap was somehow special; now znapzend steps through a lot of other obsolete iterations without crashing the kernel 15:12:50 jlevon: luckily for me, you already fixed at least this issue :D 15:22:47 true 16:02:00 yep, so not the whole run of znapzend went well 16:02:29 having competing implementations is useful :) 16:40:25 jimklimov: traceback mentions zio_ddt_free - do you have dedup enabled on any filesystems in the pool? 20:19:23 jimklimov: https://www.illumos.org/issues/14526 20:19:23 → BUG 14526: illumos guest hangs on reboot under QEMU 6.0.0 (New) 20:19:27 You are most welcome to debug it! 20:20:40 It's not surprising that it fucks up the DO control plane FWIW. I expect they're just proxying a request to reboot through to QEMU through the monitor protocol, and it's pretty clear that QEMU has a ridiculous bug that wedges the whole emulation stack in this case. 20:21:07 If you turn it off, they probably go and actually kill the process and then fire it up on a new server when you start it up again.