20:10:31 danmcd: Hi dan, just sent you (email) the output for the microcode update on Intel(r) Xeon(r) Silver 4310 CPU @ 2.10GHz (updates from: 0xd0003a5 to : 0xd0003b9). The box also has 10GigE nics with i40e driver (Intel X710). I remember something in smartos-discuss, but I blank on the details ... I can test something here if necessary too. 20:11:00 Looks like a good ucode update to me! 20:11:32 As for i40e... the problem part there is the X722 + 10GBaseT (if not the X722 altogether, but I never had X722 + SFP to test). 20:11:39 X710 should be fine. 20:11:54 (If a little memory-hoggy, but that's an i40e problem per se, and it's being looked at). 20:14:20 by who? 20:14:59 danmcd: all okay then! 20:17:39 (just want to be sure there's no overlap :P) 20:24:45 lol, zpool on that dell 750xs is dead 20:24:51 30 minutes after install 20:25:35 https://pastebin.com/hfhsm5Ui 20:27:28 jbk --> you. :) 20:28:09 heh ok.. I do have a patch for that if anyone wants to test... I should ask Marsell how it's been going.. it's been quiet which I hope is good 20:28:22 jvl: Errors but not actual corruption. I'm curious as to how fast these counters move up? 20:28:50 we've been using it internally for a while, and hopefully will have it available to our customers soon 20:28:54 (You're using lmrc(4D) for this HBA, so it's possible there's some driver corner case, or worse, some &*%^#%*&^ firmware bug, that's messing it up.) 20:29:23 hrm.. does fmdump -e show anything? 20:29:35 i can't get back in, I'd need to reboot 20:30:02 I'm on console, typed password and am just waiting 20:31:56 danmcd: according to drac, this is after 19 minutes of uptime with latest PI 20:32:06 (there's 100% cpu usage peak in graph) 20:32:31 danmcd: there was something about lmrc on the console, but got refreshed 20:32:44 grep lmrc /var/adm/messages* 20:32:49 i will reboot and try to grab it 20:32:52 NO!@ 20:32:56 ok 20:32:59 Just grab it from /var/adm/messages 20:32:59 but I can't get in 20:33:05 OH... sorry. 20:33:27 Still, when you reboot check /var/adm/messages. 20:33:49 I'm logged out and it won't verify the password anymore over ssh or console 20:37:12 damn. Send an NMI via idrac ? Or just power-cycle I guess. :( 20:38:03 I have idrac access, I'm rebooting. I think this won't be all that rare to reproduce. I messed up with too fast fingers and CTRL+D on the last ssh I had 20:38:15 fwiw, drives are in non-raid mode - that's all that looked like JBOD 20:38:56 booting from iso over drac as installer suggested log devices from the two SSDs and I can't boot from that. I didn't try to solve that just yet, because I was after the ucode update ... 20:39:24 the box has all the latest firmware from dell 20:43:18 it originally came with with OCP3.0 nic with broadcom BCM57412 that just can't netboot. after googling around, I found out that there's a firmware issue and KB with vmware that the nic doesn't pass frames (https://communities.vmware.com/t5/ESXi-Discussions/ESXi-8u1-and-BCM57412-NetXtreme-E-10Gb-problem/m-p/2986710) latest available firmware for the NIC (from dell) still falls into the problematic range. so 20:43:24 we went for that X710 module, but that one has whitelist for SFP+ and we're using extreme network switches 40G->4x10G copper and those needed to be flashed to be compatible. 20:43:28 maybe this saves some hair to somebody later 20:43:38 still booting 21:04:25 danmcd: https://pastebin.com/DM7dp497 21:06:34 jbk: fmdump -e 21:06:35 TIME CLASS 21:06:35 fmdump: warning: /var/fm/fmd/errlog is empty 21:06:39 but this is after reboot 21:18:15 I don't think the lmrc message on console had pool available to persist it in /var/adm/messages though 21:19:41 I'll try to catch the bug again, I have until thursday 7th 16.00 UTC