13:56:47 <aru> now that openbsd 7.4 is out, I wonder if I should retry getting it to run as a bhyve vm under omnios
15:05:27 <danmcd> nomad: --> Still looking for mpt_sas(4D) support for 3816 chipset?
15:11:48 <nomad> danmcd, I believe so.
15:11:59 <nomad> I don't think the ticket has seen any progress in a while.
15:12:05 <danmcd> If you're comfy compiling source check out:
15:12:20 <danmcd> https://kebe.com/~danmcd/webrevs/mpt_sas-38xx/
15:12:43 <danmcd> Grab the improved-38xx....patch file and slap it on to your favorite source build.
15:12:55 <nomad> I don't have any source builds
15:13:10 <nomad> :(
15:13:10 <danmcd> If you aren't comfy compiling, please gimme a chance and I'll compile an mpt_sas binary (which I'm pretty sure you've experience with replacing via alternate BEs.)
15:13:47 <nomad> I'm happy to do so.
15:14:34 <nomad> remind me, is that for the 3108 or the 9500?
15:14:49 <danmcd> 9500... 
15:15:09 <nomad> Which reminds me, I need to poke Silicon Mechanics again. They never answered my question about a cable for that card.
15:15:35 <nomad> right now I have the card installed in an old host but don't have any DASD connected because it uses a different connector than the 3xxx series cards. :/
15:16:10 <danmcd> "DASD" ==> now that's a term I've not heard in a long time.  (/me is having flashbacks to 1990-1992 summer jobs at IBM...)
15:16:29 <danmcd> What OmniOS are you running also?
15:16:51 <nomad> omnios-r151046-1c2c17cce7
15:17:24 <nomad> It's faster to type than "spinning rust" :)
15:19:00 <danmcd> HAH!
15:19:18 <danmcd> I'm building illumos-gate:master.  There was a header-file refactor, but that should not affect the binary.
15:19:30 <danmcd> You'll also need to modify /etc/driver_aliases in your with-new-driver BE.
15:19:40 <danmcd> You know how to do that (including `bootadm update-archive` ?)?
15:19:41 * nomad nods
15:20:57 <nomad> I believe I've done it exactly once so no, I wouldn't say I know how.
15:21:09 <nomad> but lets jump that hurdle when it's time to test the new binary :)
15:21:25 <aru> do you have a howto at hand? While I don't exactly need it (or at least not now),  it would be something to learn
15:46:08 <danmcd> Okay...
15:47:33 <danmcd> 1.) Download the binary:  https://kebe.com/~danmcd/webrevs/mpt_sas-38xx/mpt_sas (it's a non-DEBUG binary)
15:47:52 <danmcd> 2.) beadm create 38xx    ("38xx" can be a different name)
15:48:09 <danmcd> 3.) beadm mount 38xx /mnt    ("/mnt" can be any full path to an empty directory)
15:48:22 <danmcd> 4.) cp mpt_sas /mnt/kernel/drv/amd64/mpt_sas
15:48:31 <danmcd> 5.) vi /mnt/etc/driver_aliases
15:48:43 <danmcd> 6.) Find the mpt_sas lines and add:
15:49:03 <danmcd>     mpt_sas "pciex1000,e5"
15:49:13 <danmcd>     mpt_sas "pciex1000,e6"
15:49:26 <danmcd> 7.) bootadm update-archive -R /mnt
15:49:34 <danmcd> 8.) beadm umount 38xx
15:50:03 <danmcd> 9a.) Either reboot and select "38xx" by hand from your loader menu or "beadm activate -t 38xx"
15:50:17 <danmcd> 9b.) oops, "beadm activate -t 38xx" was 9b.
15:50:48 <danmcd> NOTE:  selecting from loader or using "-t"  will make it boot ONE TIME.   This allows for panics and other things not to trap you, but it means a subsequent reboot will go to your default BE.
15:51:03 <danmcd> @aru and @nomad Any thing unclear above ^^^ ?
15:51:16 <nomad> I believe I've followed along.
15:51:26 <nomad> I'm bringing up IPMI now so I can watch the boot, give me a minute or three.
15:51:28 <aru> thank you
15:53:26 <nomad> Is it a good sign or a bad sign that I have the IPMI password memorized? <sigh>
15:53:40 <nomad> rebooting into new BE
15:57:43 <nomad> "NOTICE: One or more I/O devices have been retired"
15:57:49 <nomad> I don't think I've noticed that notice before.
15:58:41 <danmcd> `fmadm faulty` is your (very verbose) friend here; pipe its output to less -M
15:59:30 <danmcd> also "prtconf -v | grep mpt_sas  " show anything?
16:00:50 <nomad> : || lvd@fs2 ~ [508] ; prtconf -v | grep mpt_sas
16:00:50 <nomad> : || lvd@fs2 ~ [509] ; 
16:01:07 <nomad> also, fmadm is saying:
16:01:15 <nomad> Fault class : fault.io.pciex.device-interr
16:01:15 <nomad> Affects     : dev:////pci@a5,0/pci8086,2030@0/pci1000,4060@0
16:01:15 <nomad>                   faulted and taken out of service
16:01:15 <nomad> FRU         : "CPU2 SLOT 5 PCI-E 3.0 X16" (hc://:product-id=iPS-42-336-EXP_(R518):server-id=fs2:chassis-id=SM121349/motherboard=0/hostbridge=6/pciexrc=6/pciexbus=175/pciexdev=0)
16:01:15 <nomad>                   faulty
16:03:32 <nomad> hmm... I thought it booted odly, I'm also seeing:
16:03:40 <nomad> Description : The system has rebooted after a kernel panic.  Refer to
16:03:52 <nomad> (I looked away at the wrong moment)
16:04:04 <nomad> but it's in the right BE so I didn't think much of it.
16:04:07 <nomad> now I'm not sure.
16:04:18 <nomad> 38xx-test          N      /          66.51M  static 2023-10-16 08:48
16:04:50 <nomad> I'm going to boot again and watch it.
16:06:12 <danmcd> "N" ==> next  and it's not temporary so if it panics you'lll need to intercede at loader.
16:06:22 <danmcd> Oh wait... it's "N" isn't it?
16:06:53 <danmcd> "N" == now.  Dammit.
16:06:57 <danmcd>  represented by 'N'; active on reboot,
16:06:57 <danmcd>            represented by 'R'; or both, represented by 'NR'.
16:06:59 <nomad> the usual BE was showing R
16:07:02 <danmcd> (^^^ from man page).
16:07:17 <danmcd> You were booted into the 38xx-test BE.
16:07:22 <danmcd> Shoot.
16:07:23 * nomad nods
16:07:28 <nomad> that was the intent :)
16:07:31 <danmcd> Oh good.
16:07:32 <nomad> I'm rebooting into it again now.
16:07:46 <danmcd> Via loader?  Or did you mark it as "activate -t" again?
16:07:53 <nomad> beadm activate -t
16:07:55 <danmcd> Cool.
16:08:30 <nomad> I need to pull a bunch of these NICs out of this box. It takes for-freakin-ever to boot.
16:08:52 <nomad> finally got the splash screen
16:10:41 <nomad> it certainly takes longer than normal to go from the hostname: line to the reading ZFS config line.
16:11:41 <nomad> it didn't panic on this boot
16:11:53 <nomad> prtconf -v is still not seeing mpt_sas
16:12:51 <nomad> interesting... the change I made to kernel/drv/amd64/mpt_sas didn't stick.
16:12:55 <nomad> let me do that again and reboot
16:13:04 <nomad> er, never mind
16:13:06 * nomad sighs
16:13:16 * danmcd is off to lunch, pardon latency.
16:14:04 <nomad> 4e009433bdcc84efb95ae64cdf141372  /kernel/drv/amd64/mpt_sas
16:14:07 <nomad> that's the right one, right?
16:25:04 <nomad> evidently the panic is old news. It shows up in the old BE as well.
16:26:25 <nomad> how do I clear all this noise in fmadm?
16:27:05 <nomad> (fmadm clear comes back with "illegal subcommand")
16:28:44 <nomad> ignore that thing about the failed PCI device, that's from 20 SEP.
16:28:54 <nomad> infact, everything I'm seeing in fmadm is old news.
16:30:28 <nomad> fmadm repair seems to have done it.
16:30:38 <nomad> ok, ready to try creating the be and testing again.
16:39:16 <nomad> this time I didn't get the obsolete hardware warning. I got something that's going to take me a bit to transcribe (It's not showing up in dmesg)
16:40:01 <nomad> "Warning: /pic@a5,0/pci8086,2030@0/pci1000,4060 (mpt_sas3):
16:40:41 <nomad>   Number of phys reported by HBA SAS IO unit Page 0 (11) is greater than that reported by the manufacturing information (8). Driver phy count limited t0 8. Please contact the firmware vendor about this."
16:40:58 <nomad> and:
16:41:05 <nomad> : || lvd@fs2 ~ [503] ; prtconf -v | grep -i mpt_sas
16:41:05 <nomad>                             value='mpt_sas'
16:41:42 <nomad> I wonder if something in those old incidents blocked it.
17:30:54 <danmcd> Interesting about that disconnect between page 0 and mfg information.  *probably* a NOP, as the 9500 board spec mentioned "3808" not "3816".
17:31:13 <danmcd> I use "fmadm acquit <UUID>" to get rid of things like kernel panic events.
17:31:36 <danmcd> Anyway, please share what you find when you hang disks off this controller?  I'd be interested, so would the illumos development list (there's a thread open about it).
17:31:56 <nomad> If I ever hear back from the vendor about the cable I'll be quite happy to test this.
17:32:07 <nomad> I'm quite peeved at them right now, obviously.
17:33:02 <nomad> I presume it would be foolish to install that driver in the primary BE.
17:34:10 <nomad> Do you need any other tests right now or should I return to primary?
17:36:14 <danmcd> nomad: If you're not using the device, having the driver is likely a  NOP unless there's SERIOUS SERIOUS attach-time bugs.
17:36:31 <danmcd> OTOH better safe than sorry.
17:36:32 <nomad> danmcd, I'm thinking in terms of future patches.
17:37:02 <nomad> I presume you aren't upstreaming this yet?
17:37:06 <danmcd> good point,  though honestly the next time mpt_sas gets a change will likely be the 38xx fix.
17:37:18 <danmcd> No upstreaming until it get tested, and I'm in no position to deliver that testing.
17:37:26 <nomad> I wish I were <sigh>
17:37:28 <danmcd> It's why I'm sharing it with community; that way people can try it out.
17:37:44 <nomad> ok, returning to regular BE. Let me know if you need me to do anything else before I manage to get my hands on that cable.
17:38:03 <danmcd> EIther that or Tintri/Nexenta gets more proactive in community interaction apart from, "Here's the code."  THere's cultural issues I don't understand there.
17:38:06 <nomad> I'll leave the BE incase you need me to reboot into it again.
17:38:14 <danmcd> Good idea re keeping the BE around.
17:39:15 <nomad> Does the "don't have too many BEs" problem still exist? I've been carefully keeping the list small.
17:39:22 <nomad> which feels like cargo culting at this point.
17:40:33 <danmcd> Still might be a problem, but one spare shouldn't kill you.
17:42:38 <nomad> at this point I have several spare since I also did the pcie test stuff.
17:42:43 <nomad> but I can clean all that up.
18:22:14 <papertigers> I wasn't aware we had a too many BE problem.  Perhaps I should clean up my boxes
18:28:01 <danmcd> Oh wait... I'm remembering something about that now.  Damn, I think it was a GRUB limitation.
18:28:30 <nomad> it was long ago but I've been doing it since.
18:28:35 <nomad> Something about more than 4 causing problems.
18:29:15 <danmcd> Whoa.  Okay that's even older than my recollection.
18:31:16 <nomad> like I said, I feel like I've been cargo culting it for a long while.
18:59:27 <nomad> (I also don't like having lots of BEs laying around since I only use them for emergency purposes so having just the last few is sufficient and more than that can just lead to confusion.)
20:23:27 <danmcd> ^^^ I feel that same way nomad ^^^