-
xmerlinIn recent months, I have experienced two cases of spontaneous reboots of a SmartOS machine. I haven't detected any hardware problems in the logs. How can I further investigate the issue? I am currently using the release 20231116T064739Z.
-
xmerlinThe last reboot occurred a few minutes ago.
-
otisi have also experienced reboots with no traces in logs. turned out to be a faulty SAS HBA.
-
neuroservecores?
-
jvlxmerlin: last time smartos was crashing on me was because of faulty non-ECC memory. Worth checking that. (or ipmitool sel elist if you have server grade and ECC)
-
jvlfmadm faulty would report ECC memory errors too and what surprised me, SmartOS actually stops using those problematic pages.
-
xmerlinotis, full nvme no HBA involved
-
xmerlinfmadm server grade ECC no memory errors in IPMI
-
xmerlinneuroserve, 2x AMD EPYC 7402 24-Core