10:30:55 there is small bug in /opt/onbld/env/omnios-illumos-gate -- export GNUC_ROOT=/opt/gcc-7/ is overwriting export GNUC_ROOT=/usr/gcc/10 12:46:31 tsoome - thanks, I'll open a PR to fix it. 13:42:51 thanks:) 20:58:37 OmniOS r151050 has been released - https://omnios.org/article/r50 20:59:37 r151038, the old LTS, is now end-of-life. Please upgrade to r151046 or r151050 to stay on a supported track. r151046 is an LTS release. 21:03:51 Does that mean 050 has been released? 21:04:37 Yes, it was released earlier today - https://omnios.org/article/r50 21:05:23 Thanks 21:11:20 andyf, any chance you could bump https://www.illumos.org/issues/16429 for me? 21:11:21 → BUG 16429: zpool hot spare replacement not grabbing similar/appropriate sized drive (New) 21:15:39 ok, I'm blind. I didn't even see this from fenix: OmniOS r151050 has been released - https://omnios.org/article/r50 21:15:39 Hi nomad 21:15:49 I'm having such. a. day. 21:26:40 I actually have no idea how hot spare selection works, that's interesting. 21:27:30 I'm sure it's nontrivial and impacts on at least three things you'd never expect to be impacted. 21:27:47 but I'd love something as "simple" as "if there's another drive that matches the model of the failing one, use it." 21:27:57 if not, do whatever it currently does. 21:28:40 By 'another drive' I mean hot spare. 21:36:51 andyf: I went digging into the hot spare selection logic a while back. it's pretty simple. 21:37:20 walk through the hot spares in order, try to do a replace. if it fails (perhaps because it's too small), move to the next one 21:37:21 It is - zfs_retire.c is a pretty short fm module 21:39:24 sommerfeld, what order? 21:39:47 If the order were "smallest to largest" that would solve the problem for me :) 21:40:20 https://github.com/illumos/illumos-gate/blob/master/usr/src/cmd/fm/modules/common/zfs-retire/zfs_retire.c#L225-L291 21:40:24 whatever order they're listed in the config, looks like. 21:40:30 But, so far, we've had at least two instances where the largest hot spare was picked, even though there was another drive of the same size as the failing one ready to go. 21:40:56 possibly the order they were originally added to the pool, but I'd have to do more digging. 21:41:25 It could also be effectively random, I don't know 21:43:55 One could envision introducing a "sort_spares (spares, nspares, dev_name)" in the middle of replace_with_spare() which ranks them in some order. 21:45:34 yeah, even I, a complete not programmer, can tell that's a short simpleish module. 21:45:46 not that I would be at all comfortable trying to modify it. 21:51:11 nomad - I would be interested to see what order the disks appear in the output of this - mdb -ke '::spa -v' 21:51:29 give me a moment 21:54:34 I presume you also want the output of iostat -En to match types? 21:54:54 just see if the order there is the order you've seen replacements use - as in, is the biggest disk first? 21:55:01 You'll have to cross reference, yes 21:55:26 https://pastebin.com/P4VCxzBr 21:57:09 That can't be the order then. Fair enough. Ordering by size does seem like a reasonable general approach. 21:57:36 "/dev/dsk/c0t5000C500D863CF37d0s0 - ST12000NM002G - 1200G 21:57:36 "/dev/dsk/c3t5000CCA2957A2819d0s0 - WUH721816AL5204 - 1600G 21:57:46 that's *after* we replaced drives, though. 21:58:01 so I wouldn't be surprised if it were in a different order before the most recent failure. 22:02:42 : || lvd@chrufs ~ [508] ; ls -ld /dev/dsk/c0t5000C500D863CF37d0s0 /dev/dsk/c3t5000CCA2957A2819d0s0 22:02:43 lrwxrwxrwx 1 root root 48 Mar 12 17:32 /dev/dsk/c0t5000C500D863CF37d0s0 -> ../../devices/scsi_vhci/disk@g5000c500d863cf37:a 22:02:43 lrwxrwxrwx 1 root root 88 Dec 6 2021 /dev/dsk/c3t5000CCA2957A2819d0s0 -> ../../devices/pci@7d,0/pci8086,2030@0/pci1000,30e0@0/iport@ff/disk@w5000cca2957a2819,0:a 22:03:09 going by that list, the smaller drive (which is listed first) was added more recently. 22:14:27 I found a second host with different sizes of hot spares. 22:14:30 c3t5000CCA2A1CE1FEDd0s0 - WUH721816AL5204 - 16000.90GB - Oct 31 2023 22:14:30 c0t5000C500DAA00C1Bd0s0 - ST12000NM004J - 12000.14GB - Dec 8 17:35 22:15:15 that's the order they're listed in the mdb output. 22:16:12 based on this incredibly small sample sit it might be that the newer drive is always listed first. Maybe. Possibly. Too little data to be sure. 22:19:50 I uploaded a small program to dump the spares the way that the retire module sees them 22:19:58 it's https://downloads/omnios.org/misc/dump_spares 22:20:32 * nomad grins 22:20:35 No size information there, so there would be a bit more work needed to sort them 22:20:47 Wait, I'm about to download and run something from IRC? As root? Can this really be happening? :) 22:21:16 * nomad doesn't have a host named 'downloads' 22:21:35 did you mean downloads.omnios.org? 22:21:50 Yes, I did! 22:22:37 heh, thought so. 22:22:47 The output is in the same order as I pasted here. 22:23:11 glad to see I didn't need to sudo to run it. 22:23:58 ... on both hosts. 22:24:09 (same order output ... on both hosts) 22:24:33 That's the order that the disks are spared in, anyway. We probably would just need to look up the drive size and try them in ascending order to make things better for you. 22:25:31 That certainly sounds reasonable to me. 22:29:52 In the output from that `dump_spares`, what do the `vdev_stats` lines look like? 22:31:57 https://pastebin.com/HytJvC7a 22:31:58 The 7th field should be the size 22:32:48 Looks like we even have the information to hand then 22:34:22 sure looks like it