-
aru
now that openbsd 7.4 is out, I wonder if I should retry getting it to run as a bhyve vm under omnios
-
danmcd
nomad: --> Still looking for mpt_sas(4D) support for 3816 chipset?
-
nomad
danmcd, I believe so.
-
nomad
I don't think the ticket has seen any progress in a while.
-
danmcd
If you're comfy compiling source check out:
-
danmcd
-
danmcd
Grab the improved-38xx....patch file and slap it on to your favorite source build.
-
nomad
I don't have any source builds
-
nomad
:(
-
danmcd
If you aren't comfy compiling, please gimme a chance and I'll compile an mpt_sas binary (which I'm pretty sure you've experience with replacing via alternate BEs.)
-
nomad
I'm happy to do so.
-
nomad
remind me, is that for the 3108 or the 9500?
-
danmcd
9500...
-
nomad
Which reminds me, I need to poke Silicon Mechanics again. They never answered my question about a cable for that card.
-
nomad
right now I have the card installed in an old host but don't have any DASD connected because it uses a different connector than the 3xxx series cards. :/
-
danmcd
"DASD" ==> now that's a term I've not heard in a long time. (/me is having flashbacks to 1990-1992 summer jobs at IBM...)
-
danmcd
What OmniOS are you running also?
-
nomad
omnios-r151046-1c2c17cce7
-
nomad
It's faster to type than "spinning rust" :)
-
danmcd
HAH!
-
danmcd
I'm building illumos-gate:master. There was a header-file refactor, but that should not affect the binary.
-
danmcd
You'll also need to modify /etc/driver_aliases in your with-new-driver BE.
-
danmcd
You know how to do that (including `bootadm update-archive` ?)?
-
» nomad nods
-
nomad
I believe I've done it exactly once so no, I wouldn't say I know how.
-
nomad
but lets jump that hurdle when it's time to test the new binary :)
-
aru
do you have a howto at hand? While I don't exactly need it (or at least not now), it would be something to learn
-
danmcd
Okay...
-
danmcd
1.) Download the binary:
kebe.com/~danmcd/webrevs/mpt_sas-38xx/mpt_sas (it's a non-DEBUG binary)
-
danmcd
2.) beadm create 38xx ("38xx" can be a different name)
-
danmcd
3.) beadm mount 38xx /mnt ("/mnt" can be any full path to an empty directory)
-
danmcd
4.) cp mpt_sas /mnt/kernel/drv/amd64/mpt_sas
-
danmcd
5.) vi /mnt/etc/driver_aliases
-
danmcd
6.) Find the mpt_sas lines and add:
-
danmcd
mpt_sas "pciex1000,e5"
-
danmcd
mpt_sas "pciex1000,e6"
-
danmcd
7.) bootadm update-archive -R /mnt
-
danmcd
8.) beadm umount 38xx
-
danmcd
9a.) Either reboot and select "38xx" by hand from your loader menu or "beadm activate -t 38xx"
-
danmcd
9b.) oops, "beadm activate -t 38xx" was 9b.
-
danmcd
NOTE: selecting from loader or using "-t" will make it boot ONE TIME. This allows for panics and other things not to trap you, but it means a subsequent reboot will go to your default BE.
-
danmcd
@aru and @nomad Any thing unclear above ^^^ ?
-
nomad
I believe I've followed along.
-
nomad
I'm bringing up IPMI now so I can watch the boot, give me a minute or three.
-
aru
thank you
-
nomad
Is it a good sign or a bad sign that I have the IPMI password memorized? <sigh>
-
nomad
rebooting into new BE
-
nomad
"NOTICE: One or more I/O devices have been retired"
-
nomad
I don't think I've noticed that notice before.
-
danmcd
`fmadm faulty` is your (very verbose) friend here; pipe its output to less -M
-
danmcd
also "prtconf -v | grep mpt_sas " show anything?
-
nomad
: || lvd@fs2 ~ [508] ; prtconf -v | grep mpt_sas
-
nomad
: || lvd@fs2 ~ [509] ;
-
nomad
also, fmadm is saying:
-
nomad
Fault class : fault.io.pciex.device-interr
-
nomad
Affects : dev:////pci@a5,0/pci8086,2030@0/pci1000,4060@0
-
nomad
faulted and taken out of service
-
nomad
FRU : "CPU2 SLOT 5 PCI-E 3.0 X16" (hc://:product-id=iPS-42-336-EXP_(R518):server-id=fs2:chassis-id=SM121349/motherboard=0/hostbridge=6/pciexrc=6/pciexbus=175/pciexdev=0)
-
nomad
faulty
-
nomad
hmm... I thought it booted odly, I'm also seeing:
-
nomad
Description : The system has rebooted after a kernel panic. Refer to
-
nomad
(I looked away at the wrong moment)
-
nomad
but it's in the right BE so I didn't think much of it.
-
nomad
now I'm not sure.
-
nomad
38xx-test N / 66.51M static 2023-10-16 08:48
-
nomad
I'm going to boot again and watch it.
-
danmcd
"N" ==> next and it's not temporary so if it panics you'lll need to intercede at loader.
-
danmcd
Oh wait... it's "N" isn't it?
-
danmcd
"N" == now. Dammit.
-
danmcd
represented by 'N'; active on reboot,
-
danmcd
represented by 'R'; or both, represented by 'NR'.
-
nomad
the usual BE was showing R
-
danmcd
(^^^ from man page).
-
danmcd
You were booted into the 38xx-test BE.
-
danmcd
Shoot.
-
» nomad nods
-
nomad
that was the intent :)
-
danmcd
Oh good.
-
nomad
I'm rebooting into it again now.
-
danmcd
Via loader? Or did you mark it as "activate -t" again?
-
nomad
beadm activate -t
-
danmcd
Cool.
-
nomad
I need to pull a bunch of these NICs out of this box. It takes for-freakin-ever to boot.
-
nomad
finally got the splash screen
-
nomad
it certainly takes longer than normal to go from the hostname: line to the reading ZFS config line.
-
nomad
it didn't panic on this boot
-
nomad
prtconf -v is still not seeing mpt_sas
-
nomad
interesting... the change I made to kernel/drv/amd64/mpt_sas didn't stick.
-
nomad
let me do that again and reboot
-
nomad
er, never mind
-
» nomad sighs
-
» danmcd is off to lunch, pardon latency.
-
nomad
4e009433bdcc84efb95ae64cdf141372 /kernel/drv/amd64/mpt_sas
-
nomad
that's the right one, right?
-
nomad
evidently the panic is old news. It shows up in the old BE as well.
-
nomad
how do I clear all this noise in fmadm?
-
nomad
(fmadm clear comes back with "illegal subcommand")
-
nomad
ignore that thing about the failed PCI device, that's from 20 SEP.
-
nomad
infact, everything I'm seeing in fmadm is old news.
-
nomad
fmadm repair seems to have done it.
-
nomad
ok, ready to try creating the be and testing again.
-
nomad
this time I didn't get the obsolete hardware warning. I got something that's going to take me a bit to transcribe (It's not showing up in dmesg)
-
nomad
"Warning: /pic@a5,0/pci8086,2030@0/pci1000,4060 (mpt_sas3):
-
nomad
Number of phys reported by HBA SAS IO unit Page 0 (11) is greater than that reported by the manufacturing information (8). Driver phy count limited t0 8. Please contact the firmware vendor about this."
-
nomad
and:
-
nomad
: || lvd@fs2 ~ [503] ; prtconf -v | grep -i mpt_sas
-
nomad
value='mpt_sas'
-
nomad
I wonder if something in those old incidents blocked it.
-
danmcd
Interesting about that disconnect between page 0 and mfg information. *probably* a NOP, as the 9500 board spec mentioned "3808" not "3816".
-
danmcd
I use "fmadm acquit <UUID>" to get rid of things like kernel panic events.
-
danmcd
Anyway, please share what you find when you hang disks off this controller? I'd be interested, so would the illumos development list (there's a thread open about it).
-
nomad
If I ever hear back from the vendor about the cable I'll be quite happy to test this.
-
nomad
I'm quite peeved at them right now, obviously.
-
nomad
I presume it would be foolish to install that driver in the primary BE.
-
nomad
Do you need any other tests right now or should I return to primary?
-
danmcd
nomad: If you're not using the device, having the driver is likely a NOP unless there's SERIOUS SERIOUS attach-time bugs.
-
danmcd
OTOH better safe than sorry.
-
nomad
danmcd, I'm thinking in terms of future patches.
-
nomad
I presume you aren't upstreaming this yet?
-
danmcd
good point, though honestly the next time mpt_sas gets a change will likely be the 38xx fix.
-
danmcd
No upstreaming until it get tested, and I'm in no position to deliver that testing.
-
nomad
I wish I were <sigh>
-
danmcd
It's why I'm sharing it with community; that way people can try it out.
-
nomad
ok, returning to regular BE. Let me know if you need me to do anything else before I manage to get my hands on that cable.
-
danmcd
EIther that or Tintri/Nexenta gets more proactive in community interaction apart from, "Here's the code." THere's cultural issues I don't understand there.
-
nomad
I'll leave the BE incase you need me to reboot into it again.
-
danmcd
Good idea re keeping the BE around.
-
nomad
Does the "don't have too many BEs" problem still exist? I've been carefully keeping the list small.
-
nomad
which feels like cargo culting at this point.
-
danmcd
Still might be a problem, but one spare shouldn't kill you.
-
nomad
at this point I have several spare since I also did the pcie test stuff.
-
nomad
but I can clean all that up.
-
papertigers
I wasn't aware we had a too many BE problem. Perhaps I should clean up my boxes
-
danmcd
Oh wait... I'm remembering something about that now. Damn, I think it was a GRUB limitation.
-
nomad
it was long ago but I've been doing it since.
-
nomad
Something about more than 4 causing problems.
-
danmcd
Whoa. Okay that's even older than my recollection.
-
nomad
like I said, I feel like I've been cargo culting it for a long while.
-
nomad
(I also don't like having lots of BEs laying around since I only use them for emergency purposes so having just the last few is sufficient and more than that can just lead to confusion.)
-
danmcd
^^^ I feel that same way nomad ^^^