19:09:31 I'm getting bunches and bunches of these in dmesg: 19:09:33 Dec 19 11:07:00 hvfs1 ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci port 5 has task file error 19:09:33 Dec 19 11:07:00 hvfs1 ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci port 5 is trying to do error recovery 19:09:33 Dec 19 11:07:00 hvfs1 ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci port 5 task_file_status = 0x451 19:09:33 Dec 19 11:07:00 hvfs1 ahci: [ID 332577 kern.warning] WARNING: ahci0: the below command (s) on port 5 are aborted 19:09:36 Dec 19 11:07:00 hvfs1 ahci: [ID 117845 kern.warning] WARNING: satapkt 0xfffffe844ff199e8: cmd_reg = 0xb0 features_reg = 0x0 sec_count_msb = 0x0 lba_low_msb = 0x4f lba_mid_msb = 0x4f lba_high_msb = 0x0 sec_count_lsb = 0x0 lba_low_lsb = 0x0 lba_mid_lsb = 0x4f lba_high_lsb = 0xc2 device_reg = 0x0 addr_type = 0x4 cmd_flags = 0x12 19:09:40 Dec 19 11:07:00 hvfs1 ahci: [ID 657156 kern.warning] WARNING: ahci0: error recovery for port 5 succeed 19:09:48 also saw for port 4, suspect lower numbers as well but it's scrolling off the top of the log. 19:10:48 an, nope, just 4 & 5 19:15:11 also, should 'smbd[554]: [ID 801721 daemon.error] SMF initialization problem: %s#012: handle not bound' be reported in the OmniOS or Illumos bug tracker? 19:23:32 illumos, but that looks unfortunate. 19:23:51 that's on x86 right? 19:24:11 I have a bug open for the %s thing at least, and I even wrote a patch but I need to rework it 19:24:21 Ok, that's a thing that sometimes happens? 19:24:29 last time I saw that was on another platform, and it was Bad. 19:25:00 The fact the handle is not bound might be, but the wonky error message is illumos 15132 (fenix) 19:25:02 BUG 15132: libsmb SMF initialization problem messages incorrect (In Progress) 19:25:02 ↳ https://www.illumos.org/issues/15132 | https://code.illumos.org/c/illumos-gate/+/2484 19:25:14 I just need to find 30 minutes to finish that off 19:25:18 whew 19:26:50 richlowe, yes, x86. 19:27:16 should I not bother filing the report about the 'handle not bound' thing then? 19:27:25 any hints on what I should be doing about the ahci problem? 19:37:51 I'm assuming those are real disks on ports 4 and 5? 19:38:34 Are you running smartctl or smartmon? I've noticed when I run smartctl I see the cmd_reg... line, folowed by the recovery line. 19:44:34 I'm running smartd. 19:47:04 hmm. I'm starting to think there's a real hardware problem here :( 19:47:24 I'm migrating a bunch of VMDKs back to this host (just patched & rebooted) and it's going very slowly. 19:47:37 Dec 19 11:34:35 hvfs1 mr_sas: [ID 992625 kern.notice] mr_sas0: cmd retry count = 1 19:47:38 Dec 19 11:34:35 hvfs1 mr_sas: [ID 992625 kern.notice] mr_sas0: cmd retry count = 1 19:47:38 Dec 19 11:34:35 hvfs1 mr_sas: [ID 692495 kern.notice] NOTICE: mr_sas0: TBOLT adapter reset successfully 19:48:28 Dec 19 11:34:00 hvfs1 mr_sas: [ID 270009 kern.warning] WARNING: mr_sas0: io_timeout_checker: FW Fault, calling reset adapter 19:48:29 Dec 19 11:34:00 hvfs1 mr_sas: [ID 643100 kern.notice] mr_sas0: io_timeout_checker: fw_outstanding 0x2A max_fw_cmds 0x39F 19:48:29 Dec 19 11:34:14 hvfs1 mr_sas: [ID 229198 kern.notice] NOTICE: Device scan in progress ...#012 19:48:29 Dec 19 11:34:35 hvfs1 mr_sas: [ID 564675 kern.notice] NOTICE: mr_sas0: mrsas_print_pending_cmds(): Called 19:48:33 oh yeah, not looking happy :( 19:48:54 ZFS isn't complaining at all. 19:54:12 ugh. Looks like 15 DEC saw a burst of these errors. Yeah, it's not unique to today. 19:54:16 * nomad feels so special 19:57:06 Is there a way to map "NOTICE: Device scan in progress ...#012" (logged by mr_sas to /var/adm/messages) to an actual device? 19:57:24 all of the hits I'm seeing in splunk mention #012 so I'm hoping that's just a failing SSD I can kick out of the pool. 20:04:07 I don't know it, but there likely is. (Unless it's dynamic every boot?) 20:05:01 I'm seeing it across multiple boots with the same number so I presume it's static. 20:06:56 * nomad sighs 20:07:24 I told the vendor we bought this box from we didn't need hardware RAID but they seem to have ignored that specification and given me an HBA that really wants to 'help'. 20:07:35 megaraid is megaannoying. 20:08:37 I have a feeling I'm going to be kicking this over to them to resolve.