07:36:22 danmcd: it happened again overnight, here's a screenshot from the iDRAC console https://pasteboard.co/33PGQCyhNjJr.png 10:00:55 I have put " route add -inet default _IP_address_ " in post-boot script in SmartOS, with the intention of global zone 'admin' interface internet access go through alternate Gateway on admin network, instead of the one that is given by DHCP as gateway. 10:01:59 Result is when I do traceroute on global zone, it once goes through gateway it got through DHCP and second time through alternate IP address stated in route add command. 10:02:31 How to make it always use alternate IP for gateway, instead of one given to admin interface through DHCP? 10:22:21 danmcd: I've reinstalled to those two SSDs after sdc-factoryreset and made striped pool out of those 12x4T SATA drives. dd if=/dev/zero of=/data/testfile bs=1M panics the box after a while. I've got a video of the console and screenshot of panic and dump is at https://pasteboard.co/oJjGhng7CTFC.png. It is reproducible 10:23:07 the quicktime video is 20M, but I have no idea where to share that 10:23:49 there's nothing in DRAC and the box was rock solid in linux for 14days, so I don't have a reason to doubt the HW. 10:26:38 jvl, where you created the pool? on Smartos/illumos or OpenZFS/ZoL/linux? There might be some feature flags on pool that are in OpenZFS and not available on illumos. 10:26:40 Plus, having striped pool of many drives sounds like and data loose invitation. Maybe several Raidz VDEVs ? 10:27:04 it's fresh install from latest smartos PI 10:27:06 only smartos 10:27:57 box was wiped clean after linux, it's on a loan from vendor for something else, I'm just testing smartos there - primarily ucode update, because it has new enough Intel CPU, but came into troubles with lmrc 10:28:23 nikolam: this is really just testing, I'm not going to use it for anything and need to return it on thursday 10:28:24 why striped 12 drives? 10:29:14 having 12 striped drives shouldn't lead to null pointer dereference in lmrc IMO 10:29:31 sysinfo? 10:31:19 jvl, sysinfo to paste somewhere ? 10:33:38 jvl, mesage you shared previously says exactly pool zones encountered I/O failure. So with those drives, striped 12 drives seems as a bad choice. 10:44:20 If speed is needed, 6 mirrored pairs VDEVs might help. Or 4 raidz VDEVs for capacity. But 4 times 3 drives mirror and 2 6 drives raidz2 is more sound. 10:46:39 jvl: do you have a core dump of that lmrc null pointer dereference? 10:48:31 jvl: and do you know what model of the HBA you have in that box? 11:03:57 nikolam: sysinfo here: https://pastebin.com/4nApyXyM 11:05:02 jvl, with 12 striped drives, Ymmw 11:06:24 nikolam: previously I had zpool2 with 12 drives and SSDs as log devices in mirror and one spare (suggested by installer) I was seeing write errors on all and lost I/O on that box, so it seems that all drives went away. The first screenshot I posted up there suggests that whole HBA tried to reset, but never came back out of it. 11:06:41 Woodstock: card is DELL PERC H700 11:06:46 looking for the dump 11:08:30 I don't have any vmdump.x under /var 11:08:36 /var/crash/volatile is empty 11:12:12 Woodstock: netbooting linux to get you better HBA name from lspci 11:15:59 Woodstock: 18:00.0 RAID bus controller: Broadcom / LSI MegaRAID 12GSAS/PCIe Secure SAS39xx Subsystem: Dell PERC H755 Adapter 11:19:06 hm 11:19:38 DRAC sees it as PERC H755 Adapter 11:19:59 too 11:20:08 ok, thanks 11:20:39 so i assume you have your zones pool attached to that? 11:21:41 this is 2U server with 4x3 3.5" bays in the front and two 2.5" bays in the back. 11:23:01 I'm booting from those SSDs, but in DRAC and even in bios of the PERC it seems that there are just two backplanes for drives connected to the PERC 11:25:00 hm yes, looking at that screenshot it's quite obvious that you don't get a dump because the same panic in lmrc happens again in the dump code path. 11:30:03 i think i see what the problem is, and it makes me wonder why i never saw that happen during testing. 11:43:45 jvl: are you aware of this issue? https://www.illumos.org/issues/15935 11:44:06 it still shouldn't panic, though. 11:44:14 Woodstock: didn't see the issue before 11:45:15 seems like the issue I had at first with that huge raidz2 pool with log devices. 11:45:45 first instance in 19 minutes after boot, second in an hour or so 11:46:48 We're returning this particular piece to the vendor tomorrow, but we'll be buying those for linux next year, so I can test some more if necessary 11:47:46 seems like we have two more of these PE750xs already in to act as backup servers, but I don't have the time schedule when they'll go into production. 11:49:17 if I can get some more info for debugging, I'm available 11:54:27 fwiw, I'm seeing it on firmware ControllerFirmwareVersion52.21.1-5149 DELL has 52.26.0-5179 now. 12:30:15 same result with latest PERC firmware (sorry I said I had latest before, I haven't checked in those couple weeks again) 12:32:27 stack trace for the latest firmware looks longer https://snipboard.io/SXHCnT.jpg 13:08:05 "same result" here means box panics on that dd. I've yet to confirm the behavior with drives missing from the BUS 15:24:29 (Thank you @Woodstock for jumping in on this.) 17:03:29 Woodstock: with regards to that https://www.illumos.org/issues/15935 - it has been created on 28th Sept, DELL released 52.21.1-5149 on 5th Oct that should solve controller hangs with "improved DDR4 memory settings 17:03:42 that's one version before current. 17:03:51 full changelog here: https://dl.dell.com/FOLDER10801914M/5/SAS_RAID_Firmware_11.5_52.26.0.5179_RELNOTES.txt?uid=63045c64-5ce0-4fbf-b85e-8458e5cdbfe2&fn=SAS_RAID_Firmware_11.5_52.26.0.5179_RELNOTES.txt 17:04:13 w 17:04:38 I'm still waiting for the disks to disappear with 52.26.0.5179 17:06:07 jvl: it doesn't 17:08:26 we've tested that. we even had a call with dell about this issue, but there hasn't been any progress so far 17:09:19 i'm going to fix that panic, and if you like you can test it :) 17:38:54 Woodstock: sure thing wuth the test! 17:38:57 with 17:55:44 I'm booting from SSD mirrored pool, so I guess I'll need ISO for piadm (or DRAC)