-
gitomat
[illumos-gate] 16241 lmrc shouldn't panic on unknown MFI completion status -- Hans Rosenfeld <rosenfeld⊙gho>
-
gitomat
[illumos-gate] 16265 sh: parameter defaults to 'int' -- Toomas Soome <tsoome⊙mc>
-
gitomat
[illumos-gate] 16255 lpadmin: type of 'sig' defaults to 'int' -- Toomas Soome <tsoome⊙mc>
-
gitomat
[illumos-gate] 16262 pmadm: 'oflag' defaults to 'int' -- Toomas Soome <tsoome⊙mc>
-
jbk
hmm.. it might be nice to be able to control the debug/log level of syseventd from smf (right now you have to either modify the method script or run it by hand)
-
jbk
it appears for a multi-pathed disk if all of the paths of a disk goes away, we never actually generate an EC_DEV_REMOVE event, instead just path offline events (that don't appear to be consumed by anything in illumos-gate)... anyone happen to know the rationale for that?
-
jbk
i'm looking at a scenario where a pool w/ redundancy (mirrors or raidz[2] -- shouldn't matter as long as there's something) and a disk is pulled in error and re-inserted
-
jbk
for a multi-path disk, it appears the dev_remove and dev_add sysevents are never generated.. just the LU online/offline ones
-
jbk
it'd be nice (perhaps if autoreplace is enabled on the pool) that the vdev is automatically onlined (assuming the state was removed and not faulted)
-
sommerfeld
jbk: I've observed an alternate problem -- single-pathed disk that's a member of a ZFS mirror goes away temporarily, and it goes into state REMOVED rather than OFFLINE, which makes resilver take a lot longer (no DTL).
-
rmustacc
So the question is how did it go away?
-
jbk
in my scenario, it was physically pulled (imagine this was done by mistake because and enclosure's indicator lights aren't the most well thought out)
-
jbk
vs. a disk failing
-
jbk
sommerfeld: IIRC, i think that's the ldi callback in vdev_disk.c that's making that specific state change
-
sommerfeld
in my case I haven't isolated what actually happened in detail - but generally the disk came back after a system power cycle.
-
rmustacc
jbk: Yeah, was thinking more so about sommerfeld's case.
-
rmustacc
So I think the thing we'd want to figure out there sommerfeld is trying to distinguish presence from electrical connectivity.
-
rmustacc
Which depending on the HBA / device may not be possible.
-
sommerfeld
in my case SATA hardware connected to an LSI SAS controller.
-
sommerfeld
but don't let my sketchy hobbyist hardware derail jbk's issue
-
tsoome
well, the first question is, do we need to distinguish all paths offline versus device removed cases?
-
rmustacc
In general it is helpful to distinguish the two where you have out of band presence.
-
rmustacc
For example, consider a single-path connection to a PCIe device. Knowing that the device is offline can happy due to power control, link state, etc.
-
rmustacc
However, removed is only triggered when you actually have physically removed it.
-
tsoome
yes, well, thats assuming that you can either detect the removal event or removal is initiated via command - that is, you let system to know that dis device will be removed.
-
tsoome
s/dis/this/
-
rmustacc
Sure, it's interface specific ultimately.
-
jbk
sommerfeld: if you're using smartos, there is a sysevent utility (it might be worth upstreaming that) that can be handy to monitor what events are firing -- that might help narrow down what's driving the state changes (though obviously won't tell you why, but might at least make it easier to zero-in)
-
jbk
my inclination with this is to (at least for my purposes) to treat the LU online event like a EC_DEV_ADD event...
-
tsoome
-
rmustacc
All sysevents that fired are recorded in the fmdump info log, fwiw.
-
sommerfeld
jbk: so this is something that has only happened maybe once or twice a year (and not recently, so I'm probably jinxing myself..)
-
rmustacc
So no need to actively monitor them.
-
jbk
rmustacc: i didn't know that, but now that brings up another question... running syseventd -d, it was reporting ESC_SUN_MP_LU_{ADD,REMOVE} events, but those aren't in the fmd info log
-
rmustacc
Dunno, just most things I looked for and have used are there.
-
rmustacc
You'll have to dig, sorry.
-
jbk
but I see EC_dev_remove.disk and EC_dev_add.disk events in the fmd info log, but did not see those from syseventd
-
jbk
yeah that's what i expected :)
-
jbk
(to dig)
-
jbk
... i do wonder if maybe the multipath bits are maybe causing those to be somehow suppressed in a way taht doesn't impact fmd
-
jbk
side note: usr/src/uts/common/os/log_sysevent.c looks like it could probably use id_space_t instead of vmem_alloc() and the like directly