00:10:11 <alanc> I rode through the desert on a device with no name, it felt good to get out of the rain
01:04:36 <SarahMalik> a bicycle?
11:12:48 <gitomat> [illumos-gate] 16338 pynfs/nfs4.1: COUR2 st_courtesy.testLockSleepLock FAILURE -- Vitaliy Gusev <gusev.vitaliy⊙gc>
11:12:48 <gitomat> [illumos-gate] 17495 NFSv4.2 unsupported operations missing reply encoding -- Gordon Ross <gwr⊙rc>
11:28:03 <gitomat> [illumos-gate] 17515 ucodeadm could handle binary AMD microcode files -- Andy Fiddaman <illumos⊙fn>
14:05:43 <danmcd_> Is Carsten Grzemba on here ?  (Maybe under a nickname that I probably know but didn't connect with $REAL_NAME ?)
18:41:16 <Smithx10> We are thinking about doing some better operations monitoring around knowing if zpools have issues / have drive errors and maybe also exporting information about the host.   I saw https://github.com/oxidecomputer/kstat-rs  which I imagine could be used to expose metrics for some kind of illumos exporter for prometheus.    Did oxide or anyone write any fault management / zfs libs to be used from rust ?
18:46:55 <jbk> i think your question is actually two different questions
18:47:06 <jbk> kstats are just stats
18:48:20 <jbk> fault management is/should be receiving fault events and determine if a component is faulty (and if so mark it)
18:49:05 <jbk> that data may include kstats, but the system itself has a mechanism for things to generate a fault event, which is distinct from bumping any stats
18:49:54 <Smithx10> How would I listen to those events from rust / go to get informed ?
18:50:33 <Smithx10> I really just want to get away from running sdc-oneachnode to know when disks / pools may be not good.
18:52:54 <jbk> the only 'official' way currently is either running fmadm or running net-snmp w/ the fmd module and doing an snmp-get on the table... i don't know if oxide has written a wrapper to the fmd libraries or not though (they're 'private' but that just means 'these may change in a way you have to patch or recompile your code') but distros can also manage that
18:53:27 <jbk> having said that, there is some room for improvement with zfs and fmd
18:53:51 <jbk> i've been wanting to do it for a while, but getting sufficient time with hardware (especially for testing -- that's probably far more challenging than the work itself)
18:54:08 <jbk> has meant i haven't done it yet
18:55:07 <sommerfeld> for real testing of the upper layers of the fault management you probably want to augment bhyve's drivers with virtualized brokenness/fault injection.
18:55:09 <jbk> for HDDs at least (though I suspect SSDs -- either as a SCSI device or NVMe device -- are probably similar enough) may have a bad 'spot' in them while the rest of the device is fine and have some amount of 'replacement' blocks
18:55:38 <jbk> today, if the drive can't read from a block, it returns an error
18:55:47 <jbk> zfs interprets that as 'the whole disk is bad'
18:55:57 <jbk> and will kick in a spare if available
18:56:15 <jbk> what would arguably be better is
18:57:05 <jbk> to mark that block as bad (drives can do this automatically) and if sufficient pool redundancy is available (mirrors, raidz, ...) self heal after the block(s) have been marked bad
18:57:14 <jbk> as the drive will remap those to some of the reserve blocks
18:57:52 <jbk> and most vendors will tell you how many bad blocks means 'bad drive'
18:58:31 <jbk> (zfs only self heals if the drive returns success, but with bad data)
18:58:35 <jbk> at least today
19:05:53 <Smithx10> jbk where can i read about the fmd module.... having this available via snmp would be decent.
19:12:11 <jbk> oh there is also a process that will send an event via smtp as well
19:12:43 <jbk> annoyingly, i'm not seeing any man pages for any of this (though just might have missed them)
19:13:21 <jbk> https://github.com/illumos/illumos-gate/tree/master/usr/src/lib/fm/libfmd_snmp/mibs has the SNMP mibs
19:13:58 <jbk> alsohttps://github.com/illumos/illumos-gate/tree/master/usr/src/cmd/fm/notify/smtp-notify/common has the source for the smtp notification service
19:14:27 <jbk> that could be used as a guide for talking to fmd via it's library interface (which should be doable from rust I would think)
21:58:55 <gitomat> [illumos-gate] 17517 loader: errors with pointer conversion -- Toomas Soome <tsoome⊙mc>