13:56:47 now that openbsd 7.4 is out, I wonder if I should retry getting it to run as a bhyve vm under omnios 15:05:27 nomad: --> Still looking for mpt_sas(4D) support for 3816 chipset? 15:11:48 danmcd, I believe so. 15:11:59 I don't think the ticket has seen any progress in a while. 15:12:05 If you're comfy compiling source check out: 15:12:20 https://kebe.com/~danmcd/webrevs/mpt_sas-38xx/ 15:12:43 Grab the improved-38xx....patch file and slap it on to your favorite source build. 15:12:55 I don't have any source builds 15:13:10 :( 15:13:10 If you aren't comfy compiling, please gimme a chance and I'll compile an mpt_sas binary (which I'm pretty sure you've experience with replacing via alternate BEs.) 15:13:47 I'm happy to do so. 15:14:34 remind me, is that for the 3108 or the 9500? 15:14:49 9500... 15:15:09 Which reminds me, I need to poke Silicon Mechanics again. They never answered my question about a cable for that card. 15:15:35 right now I have the card installed in an old host but don't have any DASD connected because it uses a different connector than the 3xxx series cards. :/ 15:16:10 "DASD" ==> now that's a term I've not heard in a long time. (/me is having flashbacks to 1990-1992 summer jobs at IBM...) 15:16:29 What OmniOS are you running also? 15:16:51 omnios-r151046-1c2c17cce7 15:17:24 It's faster to type than "spinning rust" :) 15:19:00 HAH! 15:19:18 I'm building illumos-gate:master. There was a header-file refactor, but that should not affect the binary. 15:19:30 You'll also need to modify /etc/driver_aliases in your with-new-driver BE. 15:19:40 You know how to do that (including `bootadm update-archive` ?)? 15:19:41 * nomad nods 15:20:57 I believe I've done it exactly once so no, I wouldn't say I know how. 15:21:09 but lets jump that hurdle when it's time to test the new binary :) 15:21:25 do you have a howto at hand? While I don't exactly need it (or at least not now), it would be something to learn 15:46:08 Okay... 15:47:33 1.) Download the binary: https://kebe.com/~danmcd/webrevs/mpt_sas-38xx/mpt_sas (it's a non-DEBUG binary) 15:47:52 2.) beadm create 38xx ("38xx" can be a different name) 15:48:09 3.) beadm mount 38xx /mnt ("/mnt" can be any full path to an empty directory) 15:48:22 4.) cp mpt_sas /mnt/kernel/drv/amd64/mpt_sas 15:48:31 5.) vi /mnt/etc/driver_aliases 15:48:43 6.) Find the mpt_sas lines and add: 15:49:03 mpt_sas "pciex1000,e5" 15:49:13 mpt_sas "pciex1000,e6" 15:49:26 7.) bootadm update-archive -R /mnt 15:49:34 8.) beadm umount 38xx 15:50:03 9a.) Either reboot and select "38xx" by hand from your loader menu or "beadm activate -t 38xx" 15:50:17 9b.) oops, "beadm activate -t 38xx" was 9b. 15:50:48 NOTE: selecting from loader or using "-t" will make it boot ONE TIME. This allows for panics and other things not to trap you, but it means a subsequent reboot will go to your default BE. 15:51:03 @aru and @nomad Any thing unclear above ^^^ ? 15:51:16 I believe I've followed along. 15:51:26 I'm bringing up IPMI now so I can watch the boot, give me a minute or three. 15:51:28 thank you 15:53:26 Is it a good sign or a bad sign that I have the IPMI password memorized? 15:53:40 rebooting into new BE 15:57:43 "NOTICE: One or more I/O devices have been retired" 15:57:49 I don't think I've noticed that notice before. 15:58:41 `fmadm faulty` is your (very verbose) friend here; pipe its output to less -M 15:59:30 also "prtconf -v | grep mpt_sas " show anything? 16:00:50 : || lvd@fs2 ~ [508] ; prtconf -v | grep mpt_sas 16:00:50 : || lvd@fs2 ~ [509] ; 16:01:07 also, fmadm is saying: 16:01:15 Fault class : fault.io.pciex.device-interr 16:01:15 Affects : dev:////pci@a5,0/pci8086,2030@0/pci1000,4060@0 16:01:15 faulted and taken out of service 16:01:15 FRU : "CPU2 SLOT 5 PCI-E 3.0 X16" (hc://:product-id=iPS-42-336-EXP_(R518):server-id=fs2:chassis-id=SM121349/motherboard=0/hostbridge=6/pciexrc=6/pciexbus=175/pciexdev=0) 16:01:15 faulty 16:03:32 hmm... I thought it booted odly, I'm also seeing: 16:03:40 Description : The system has rebooted after a kernel panic. Refer to 16:03:52 (I looked away at the wrong moment) 16:04:04 but it's in the right BE so I didn't think much of it. 16:04:07 now I'm not sure. 16:04:18 38xx-test N / 66.51M static 2023-10-16 08:48 16:04:50 I'm going to boot again and watch it. 16:06:12 "N" ==> next and it's not temporary so if it panics you'lll need to intercede at loader. 16:06:22 Oh wait... it's "N" isn't it? 16:06:53 "N" == now. Dammit. 16:06:57 represented by 'N'; active on reboot, 16:06:57 represented by 'R'; or both, represented by 'NR'. 16:06:59 the usual BE was showing R 16:07:02 (^^^ from man page). 16:07:17 You were booted into the 38xx-test BE. 16:07:22 Shoot. 16:07:23 * nomad nods 16:07:28 that was the intent :) 16:07:31 Oh good. 16:07:32 I'm rebooting into it again now. 16:07:46 Via loader? Or did you mark it as "activate -t" again? 16:07:53 beadm activate -t 16:07:55 Cool. 16:08:30 I need to pull a bunch of these NICs out of this box. It takes for-freakin-ever to boot. 16:08:52 finally got the splash screen 16:10:41 it certainly takes longer than normal to go from the hostname: line to the reading ZFS config line. 16:11:41 it didn't panic on this boot 16:11:53 prtconf -v is still not seeing mpt_sas 16:12:51 interesting... the change I made to kernel/drv/amd64/mpt_sas didn't stick. 16:12:55 let me do that again and reboot 16:13:04 er, never mind 16:13:06 * nomad sighs 16:13:16 * danmcd is off to lunch, pardon latency. 16:14:04 4e009433bdcc84efb95ae64cdf141372 /kernel/drv/amd64/mpt_sas 16:14:07 that's the right one, right? 16:25:04 evidently the panic is old news. It shows up in the old BE as well. 16:26:25 how do I clear all this noise in fmadm? 16:27:05 (fmadm clear comes back with "illegal subcommand") 16:28:44 ignore that thing about the failed PCI device, that's from 20 SEP. 16:28:54 infact, everything I'm seeing in fmadm is old news. 16:30:28 fmadm repair seems to have done it. 16:30:38 ok, ready to try creating the be and testing again. 16:39:16 this time I didn't get the obsolete hardware warning. I got something that's going to take me a bit to transcribe (It's not showing up in dmesg) 16:40:01 "Warning: /pic@a5,0/pci8086,2030@0/pci1000,4060 (mpt_sas3): 16:40:41 Number of phys reported by HBA SAS IO unit Page 0 (11) is greater than that reported by the manufacturing information (8). Driver phy count limited t0 8. Please contact the firmware vendor about this." 16:40:58 and: 16:41:05 : || lvd@fs2 ~ [503] ; prtconf -v | grep -i mpt_sas 16:41:05 value='mpt_sas' 16:41:42 I wonder if something in those old incidents blocked it. 17:30:54 Interesting about that disconnect between page 0 and mfg information. *probably* a NOP, as the 9500 board spec mentioned "3808" not "3816". 17:31:13 I use "fmadm acquit " to get rid of things like kernel panic events. 17:31:36 Anyway, please share what you find when you hang disks off this controller? I'd be interested, so would the illumos development list (there's a thread open about it). 17:31:56 If I ever hear back from the vendor about the cable I'll be quite happy to test this. 17:32:07 I'm quite peeved at them right now, obviously. 17:33:02 I presume it would be foolish to install that driver in the primary BE. 17:34:10 Do you need any other tests right now or should I return to primary? 17:36:14 nomad: If you're not using the device, having the driver is likely a NOP unless there's SERIOUS SERIOUS attach-time bugs. 17:36:31 OTOH better safe than sorry. 17:36:32 danmcd, I'm thinking in terms of future patches. 17:37:02 I presume you aren't upstreaming this yet? 17:37:06 good point, though honestly the next time mpt_sas gets a change will likely be the 38xx fix. 17:37:18 No upstreaming until it get tested, and I'm in no position to deliver that testing. 17:37:26 I wish I were 17:37:28 It's why I'm sharing it with community; that way people can try it out. 17:37:44 ok, returning to regular BE. Let me know if you need me to do anything else before I manage to get my hands on that cable. 17:38:03 EIther that or Tintri/Nexenta gets more proactive in community interaction apart from, "Here's the code." THere's cultural issues I don't understand there. 17:38:06 I'll leave the BE incase you need me to reboot into it again. 17:38:14 Good idea re keeping the BE around. 17:39:15 Does the "don't have too many BEs" problem still exist? I've been carefully keeping the list small. 17:39:22 which feels like cargo culting at this point. 17:40:33 Still might be a problem, but one spare shouldn't kill you. 17:42:38 at this point I have several spare since I also did the pcie test stuff. 17:42:43 but I can clean all that up. 18:22:14 I wasn't aware we had a too many BE problem. Perhaps I should clean up my boxes 18:28:01 Oh wait... I'm remembering something about that now. Damn, I think it was a GRUB limitation. 18:28:30 it was long ago but I've been doing it since. 18:28:35 Something about more than 4 causing problems. 18:29:15 Whoa. Okay that's even older than my recollection. 18:31:16 like I said, I feel like I've been cargo culting it for a long while. 18:59:27 (I also don't like having lots of BEs laying around since I only use them for emergency purposes so having just the last few is sufficient and more than that can just lead to confusion.) 20:23:27 ^^^ I feel that same way nomad ^^^