07:18:16 [illumos-gate] 16289 gss_mechs: left shift of negative value -- Toomas Soome 07:21:46 [illumos-gate] 16281 libnsl: left shift of negative value -- Toomas Soome 07:24:16 [illumos-gate] 16292 gss_mechs: symbol 'asn1buf_insert_octet' is multiply-defined -- Toomas Soome 16:29:35 [illumos-gate] 16293 libsldap: passing argument 1 to 'restrict'-qualified parameter aliases with argument 4 -- Toomas Soome 16:32:07 @tsoome_ --> I'd prefer richlowe look at 16301 (sgs related). 16:32:19 I figured so:) 16:33:40 [illumos-gate] 16295 nc: 'restrict'-qualified parameter aliases with argument -- Toomas Soome 16:37:29 [illumos-gate] 16296 snoop: 'restrict'-qualified parameter aliases with argument 4 -- Toomas Soome 16:41:05 [illumos-gate] 16298 mem: 'restrict'-qualified parameter aliases with argument 4 -- Toomas Soome 19:32:36 jbk: sorry, I realize this conversation has like 1.5 day latency, but what am I missing to make this fully light up as failed automatically? 19:45:59 Thanks richlowe 20:01:06 KungFuJesus: The lights are generally managed by FMA, and for the fault light to be on, the system needs to think that the disk is broken and match that up with an FMA case as far as I know. So we'd need to look at things like: why do you think the disk is faulty, and why does the system apparently not think that; also, what actions have you taken, etc, since determining the disk is faulty. 20:01:30 do you need topo for the chassis too? 20:02:21 Earlier they mentioned they have a SES enclosure, which is self-describing 20:02:36 Also that the lights _are_ able to be turned on, manually, we just turn them off later. 20:03:15 There is often no good way to ask the firmware in these things "is your light on already?" so we tend to resort to flushing out new LED state periodically 20:03:30 oh, I missed that 20:04:07 In the limit if you want to be able to manually override the LED state, we'll need to come up with some way to allow that -- to mediate the desires of both the operator _and_ the fault management stuff, basicaly 20:04:17 Which is totally possible but is not a mechanism that exists today 20:05:03 unrelated: I want to split pkg:/system/kernel/platform, because it's a mess if there's more than one. Do people prefer pkg:/system/kernel/platform/{i386,aarch64} (cf. SUNWcakr.x), or two distinct manifests that each deliver the same FMRI? 20:05:13 rather than the mess of $(i386_ONLY) that happens otherwise 20:05:49 you can see the mess, in fact, even _without_ 2 platforms. 20:08:35 technically there's a 3rd option: make pkg:/system/kernel/platform i386-only, and other platforms deliver pkg:/system/kernel/platform/, but I hate that one 20:29:40 [illumos-gate] 16301 sgs: 'restrict'-qualified parameter aliases with argument 4 -- Toomas Soome 20:31:42 I think I prefer a separate FMRI for each. There is some precedent with pkg:/driver/i86pc/platform for the arch to come before 'platform' though 20:32:22 I definitely support splitting it up! 21:14:40 jclulow: so, is fmd the thing that's actually turning the LED back off? 21:15:21 so if that's the case, fmd apparently then isn't properly connecting that bay and disk. What's weird is that I used fmtopo -V to tell me the faulted disk in the pool 21:15:32 fmd reported the disk as faulted and fmtopo knows the bay it's in 21:15:52 one thing that perhaps makes it weird is that it's going through VHCI for multipathing. Is that perhaps part of the problem? 21:17:32 unfortunately at the moment the disk is already being replaced, so we maybe lost our chance at a guinea pig opportunity 21:18:40 https://pastebin.com/LXsP5cEs 21:20:28 jclulow: the fault event: https://pastebin.com/7dnpKUhi 21:20:54 wtf, it even names the slot in the event, lol 21:21:25 at this point I don't know what else I could give fmd to know any better than it does 21:22:20 KungFuJesus: That's good information! 21:22:33 indeed, why doesn't it know how to act on it? 21:30:33 An excellent question! This fmd module is supposed to drive the lights based on that sort of thing: https://github.com/illumos/illumos-gate/tree/master/usr/src/cmd/fm/modules/common/disk-lights 21:31:36 So you have a FRU with the right shaped FMRI: hc://:product-id=SMC-SC846P:server-id=:chassis-id=5003048017d2467f:serial=8DK8LE1H:part=HGST-HUH721212AL4200:revision=A3D0/ses-enclosure=0/bay=12/disk=0 21:31:46 i.e., a DISK under a BAY 21:32:09 I think maybe the disconnect here is that we've got a "fault.fs.zfs.vdev.io" fault 21:32:33 But the disk-lights module is subscribing to fault.io.{disk,scsi}.* 21:32:52 so, modify the config file to point to fault.fs.zfs.vdev.io? 21:33:36 Well 21:33:40 I am not sure 21:34:00 But perhaps! 21:34:06 hah, could it hurt? 21:34:49 Well, anything could hurt haha -- but in this case I think the likelihood is low 21:35:01 It doesn't look like the actual C code has anything specific about the fault classes, https://github.com/illumos/illumos-gate/blob/master/usr/src/cmd/fm/modules/common/disk-lights/disk_lights.c#L188-L195 21:35:28 So maybe subscribing it to more fault classes will be enough, if fmd_nvl_fmri_has_fault() manages to link up the vdev fault with the disk FMRI -- but I don't know if it will 21:35:53 It's also possible that the problem is at a much higher level, and that we should have some kind of translation of a vdev-level fault into a disk-level fault 21:36:32 But yeah I think this is where the disconnect is! 21:36:46 I also think this will be reproducible without your specific system 21:37:10 But you could preserve the contents of /var/fm right now to be sure that we have all the data 21:37:31 (it has all the log data and stuff that fmd uses to make these diagnoses) 21:37:47 sure thing - the disk itself has been removed from the pool and is being replaced but I suspect that won't matter if it's historically logged 21:38:06 I'll tar it up, where should I send it to? 21:55:32 jclulow: thanks for making me look at what `fmsim` does :\ 21:55:58 KungFuJesus: how big is the tar.gz 21:57:23 ~2MB 21:59:46 KungFuJesus: I think you can just upload it to this issue as an attachment probably: https://www.illumos.org/issues/16353 21:59:47 → BUG 16353: disk fault lights should come on for ZFS vdev-level faults (New) 22:03:08 cool, uploaded 22:07:04 andyf: (and others interested), my current preference is to treat platform as a variable, that is, s/platform/i86pc/ s/platform/aarch64/ s/platform/raspberry-pi-4/ in those (and other?) packages. 22:07:55 looking at *platform*.p5m, it looks the most reasonable 22:21:30 richlowe: I would like to be able to install _just_ the oxide parts and not the i86pc parts, if that helps 22:21:48 (oxide being an i86pc-level peer in the taxonomy) 22:31:13 Which I suspect means we should codify some kind of facet structure for pulling those in, as well 22:33:18 Although I suppose it's possible nothing explicitly depends on /system/kernel/platform today so maybe that won't be needed 22:33:40 jclulow: In the model I suggested above, oxide would be system/kernel/oxide 22:33:54 oxide as a moral equivalent to a raspberry-pi-4 22:33:57 except a touch more expensive 22:34:46 so system/kernel/i86pc system/kernel/i86xpv(?) system/kernel/oxide system/kernel/aarch64 system/library/i86pc 22:34:51 etc. 22:35:03 literally replace the word "platform" with the platform 22:35:13 and split the shitty ones as necessary (that's just system/kernel/platform) 22:35:52 rhetorical question: what else lives under system/kernel/foo that isn't a platform? 22:37:01 oh, hmmmm, specific kernel features. 22:37:15 system/kernel/cpu-counters for eg 22:37:28 Yeah I think you want /system/kernel/platform/$platform 22:37:49 yeah, and then that really needs to be /system/kernel/$platform/platform, maybe, to follow the majority of the mess we've made 22:38:09 see driver/i86pc/* etc. 22:38:12 Apparently there was once a /system/kernel/platform/netra so this is not really new 22:38:14 I hate naming stuff 22:38:22 though not as much as I hate that package 22:38:31 jclulow: oh, in that case I can go back to how I did it at first! 22:38:36 that's good to know 22:41:44 But yeah we currently, when building an oxide-specific ramdisk, and up just removing a lot of the stuff from that (and similar) packages anyway: https://github.com/oxidecomputer/helios/blob/master/image/templates/gimlet/ramdisk-02-trim.json#L28-L35 22:42:22 yeah, hopefully I can split this in a way that makes that better. 22:42:26 Incremental improvements to making it easier to do that by just installing the bits you need would be awesome 22:44:49 I'll do what I can, I just hate having a manifest that is literally two entirely distinct packages :)