-
jclulow
Yeah not just yet, unfortunately -- it's not quite finished haha
-
jbk
sounds like my tpm driver :)
-
KungFuJesus
hmm, boot seems to be hanging on "configuring devices" after a giant leap in updates
-
KungFuJesus
anything I can do to diagnose this? Verbose mode doesn't print any more info after configuring devices
-
KungFuJesus
-
rmustacc
First thing I usually try to do here is boot into kmdb and then inject an nmi and see if that does it.
-
rmustacc
And then I can inspect state.
-
KungFuJesus
ok, so I saw the kmdb option on the bootloader
-
KungFuJesus
I can boot to that, then can you give me the command?
-
rmustacc
Command for what?
-
rmustacc
What to look at?
-
KungFuJesus
yeah, what to do in kmdb
-
rmustacc
I can give some high level bits, I am not really around IRC much today.
-
KungFuJesus
I assume you mean the option from the bootloader option that says: "kmdb mode"?
-
KungFuJesus
err s/option/g
-
KungFuJesus
fwiw I can boot into the ancient boot environment still because it doesn't attempt a device reconfigure
-
rmustacc
There is the kmdb mode that says something like 'At Boot'?
-
rmustacc
IIRC.
-
KungFuJesus
yep, found "On Boot"
-
KungFuJesus
anything else I should set while in the boot loader?
-
rmustacc
Nothing really comes to mind given your comments about verbose mode.
-
KungFuJesus
ok, I'll boot and let you know when I'm at the prompt. Sounds like it should be immediately
-
KungFuJesus
yep, I'm there
-
rmustacc
Yeah, when you do just hit ':c'.
-
rmustacc
Then when it hands, try to inject an NMI and see if that drops you back into kmdb.
-
rmustacc
That'd be via something like ipmitool chassis power diag IIRC.
-
KungFuJesus
Hmm, not sure I have a machine with ipmi tool, let me see if I can find it from the web UI
-
rmustacc
If you do get in, the things I'd look at are ::cpuinfo, $C, ::stacks, etc.
-
rmustacc
If that doesn't work, there are much more manual ways to go through it all using moddebug and breakpoints.
-
KungFuJesus
did this:
-
KungFuJesus
Adam Stylinski 10:43 AM
-
KungFuJesus
[astylinski@fedoravm ~]$ ipmitool -H 192.168.8.45 -U ADMIN -P ADMIN chassis power diag
-
KungFuJesus
Chassis Power Control: Diag
-
KungFuJesus
didn't drop me to mdb, though
-
KungFuJesus
there was another option for mdb for "on NMI" rather than on boot
-
KungFuJesus
did I need that instead?
-
KungFuJesus
I'll try that
-
KungFuJesus
looks like IPMI tool is sending it but this supermicro BMC is deaf to it for whatever reason
-
KungFuJesus
it is _fairly_ old. I'll drop back to the mdb prompt at boot for the long route
-
rmustacc
No, you wouldn't need to switch that. There are reasons why it may not work.
-
KungFuJesus
alright, back at the prompt
-
KungFuJesus
there's no way to set a magic key sequence? Can I get a stop-a, heh?
-
Agnar
on x86 iirc there was also F1-a on the physical keyboard, but I never used it. Also Pause-A should work iirc
-
KungFuJesus
I'll give it a try
-
Agnar
but your BMC should be able to send a NMI to the system, that should do the trick
-
Agnar
if it doesn't work, it's probably stuck somewhere weird. I have a Dell Laptop that I can't break into mdb also
-
KungFuJesus
yeah those things aren't working either through the virtual keyboard or my physical one (I'm stuck dealing with iKVM for this thing)
-
Agnar
is there a way to send a NMI through the BMCs cli?
-
KungFuJesus
supermicro claims the chassis power diag should do it
-
KungFuJesus
though, I'm uncertain that works for everything supermicro
-
Agnar
hmm.
-
KungFuJesus
hmm, there may be an option to enable it in the bios? Let me consult that
-
rmustacc
If an NMI doesn't work the alternate break is not going to get you out of it.
-
rmustacc
You can verify NMI functionality by booting to your old BE and confirming.
-
KungFuJesus
I have the displeasure of a BIOS that has a bug that post 2020, will not let you enter it. So to get into it, you have to set the system clock from the OS to something prior to december 2020 and then reboot
-
Agnar
KungFuJesus: oh Jesus, you really have a hard time:/
-
KungFuJesus
Yeah, thanks AMI
-
KungFuJesus
I'm working with one of the IT people on prem, he may be able to update the BIOS in order to see if NMI is disabled (which from what I can gather, probably is, due to supposedly an issue in the FBSD boot process that hangs with an NMI watchdog). Which you know, is Illumos' loader too, so, that could fun
-
rmustacc
Those are different things.
-
rmustacc
If you want to test the nmi, just go to your old BE and inject it.
-
rmustacc
We can definitely hang in ways that an NMI won't matter.
-
KungFuJesus
via ipmitool once booted? I can give it a shot
-
rmustacc
Yes.
-
KungFuJesus
yeah, ipmitool NMI injection doesn't do anything in the old BE
-
KungFuJesus
it really does seem like the "configuring devices" bit is the reason it hangs. I also have a disk in there that it seems to complain about with a missing GUID
-
KungFuJesus
Should I try "reconfigure" from the boot loader for the old boot environment, or is that a recipe for not being able to boot into that?
-
KungFuJesus
ok, this is interesting. I was able to send it from the localhost
-
KungFuJesus
it's doing something but I'm not dropping to the mdb prompt, I just keep getting a bunch of diagnostic messages about log info 0x31111000 from a disk on the SAS controller
-
KungFuJesus
hmm, got it booting. Updated the 9211 firmware to something not ancient and relaunched the update process into a new boot environment
-
jbk
LSI?
-
sjorge
jclulow: i messed with it briefly in a vm, it at least seemed to probe one of my current driver not supported devices correctly, didn't actrually try and run the software that talks to it though
-
sjorge
so at a quick glance it seems better than the current
-
KungFuJesus
jbk: yes, it was in P15, I updated to P20
-
KungFuJesus
hard to say _exactly_ if that's what fixed it, but there was the perfect storm of a disk on the controller there that didn't present a WWN
-
jbk
i swear there's a bug in the driver where on resets/timeouts its losing I/Os somehow, but haven't had time to dig into it
-
KungFuJesus
and another thing that sorted of hinted at it is when I sent an NMI locally (remote doesn't work for some reason?) it just spammed the console about log info 0x31111000 on one of the targets on one of the SAS controllers
-
KungFuJesus
that disk should probably be removed to eliminate future headaches, I don't know why it's such a problem child
-
KungFuJesus
It'd have likely booted if I removed that disk, is my guess. I also removed the nvidia driver so that it'd stop complaining about it at boot, but I suspect that was a red herring and harmless