-
jvl
danmcd: Hi dan, just sent you (email) the output for the microcode update on Intel(r) Xeon(r) Silver 4310 CPU @ 2.10GHz (updates from: 0xd0003a5 to : 0xd0003b9). The box also has 10GigE nics with i40e driver (Intel X710). I remember something in smartos-discuss, but I blank on the details ... I can test something here if necessary too.
-
danmcd
Looks like a good ucode update to me!
-
danmcd
As for i40e... the problem part there is the X722 + 10GBaseT (if not the X722 altogether, but I never had X722 + SFP to test).
-
danmcd
X710 should be fine.
-
danmcd
(If a little memory-hoggy, but that's an i40e problem per se, and it's being looked at).
-
jbk
by who?
-
jvl
danmcd: all okay then!
-
jbk
(just want to be sure there's no overlap :P)
-
jvl
lol, zpool on that dell 750xs is dead
-
jvl
30 minutes after install
-
jvl
-
danmcd
jbk --> you. :)
-
jbk
heh ok.. I do have a patch for that if anyone wants to test... I should ask Marsell how it's been going.. it's been quiet which I hope is good
-
danmcd
jvl: Errors but not actual corruption. I'm curious as to how fast these counters move up?
-
jbk
we've been using it internally for a while, and hopefully will have it available to our customers soon
-
danmcd
(You're using lmrc(4D) for this HBA, so it's possible there's some driver corner case, or worse, some &*%^#%*&^ firmware bug, that's messing it up.)
-
jbk
hrm.. does fmdump -e show anything?
-
jvl
i can't get back in, I'd need to reboot
-
jvl
I'm on console, typed password and am just waiting
-
jvl
danmcd: according to drac, this is after 19 minutes of uptime with latest PI
-
jvl
(there's 100% cpu usage peak in graph)
-
jvl
danmcd: there was something about lmrc on the console, but got refreshed
-
danmcd
grep lmrc /var/adm/messages*
-
jvl
i will reboot and try to grab it
-
danmcd
NO!@
-
jvl
ok
-
danmcd
Just grab it from /var/adm/messages
-
jvl
but I can't get in
-
danmcd
OH... sorry.
-
danmcd
Still, when you reboot check /var/adm/messages.
-
jvl
I'm logged out and it won't verify the password anymore over ssh or console
-
danmcd
damn. Send an NMI via idrac ? Or just power-cycle I guess. :(
-
jvl
I have idrac access, I'm rebooting. I think this won't be all that rare to reproduce. I messed up with too fast fingers and CTRL+D on the last ssh I had
-
jvl
fwiw, drives are in non-raid mode - that's all that looked like JBOD
-
jvl
booting from iso over drac as installer suggested log devices from the two SSDs and I can't boot from that. I didn't try to solve that just yet, because I was after the ucode update ...
-
jvl
the box has all the latest firmware from dell
-
jvl
it originally came with with OCP3.0 nic with broadcom BCM57412 that just can't netboot. after googling around, I found out that there's a firmware issue and KB with vmware that the nic doesn't pass frames (
communities.vmware.com/t5/ESXi-Disc…etXtreme-E-10Gb-problem/m-p/2986710) latest available firmware for the NIC (from dell) still falls into the problematic range. so
-
jvl
we went for that X710 module, but that one has whitelist for SFP+ and we're using extreme network switches 40G->4x10G copper and those needed to be flashed to be compatible.
-
jvl
maybe this saves some hair to somebody later
-
jvl
still booting
-
jvl
-
jvl
jbk: fmdump -e
-
jvl
TIME CLASS
-
jvl
fmdump: warning: /var/fm/fmd/errlog is empty
-
jvl
but this is after reboot
-
jvl
I don't think the lmrc message on console had pool available to persist it in /var/adm/messages though
-
jvl
I'll try to catch the bug again, I have until thursday 7th 16.00 UTC