-
al_s
Hi. I am having issues with omnios in AWS. Running omnios-r151046be on t2.large instance. I have 2 instances, pretty much exactly the same. One has been turned off for a while and when started it is getting the clock speed completely wrong.
-
al_s
root@ip-10-130-1-149:/localhome/aslate-local# psrinfo -pv
-
al_s
The physical processor has 2 virtual processors (0-1)
-
al_s
x86 (GenuineIntel 406F1 family 6 model 79 step 1 clock 112 MHz)
-
al_s
Intel(r) Xeon(r) CPU E5-2686 v4 @ 2.30GHz
-
al_s
This is causing the machine to have very strange issues
-
al_s
root@ip-10-130-1-149:/localhome/aslate-local# time sleep 10
-
al_s
real 0m0.817s
-
al_s
user 0m0.017s
-
al_s
sys 0m0.046s
-
al_s
The other machine is fine and continues to be after rebooting. So seems to be an issue after stopping and restarting.
-
al_s
Any ideas?
-
ptribble
al_s: we occasionally see AWS giving us a very strange clock, see for example a discussion (and potential resolution) a little while ago
-
ptribble
-
al_s
Thank you ptribble I am giving that a go now
-
al_s
Hmm, now the instance does not start at all, seeing this in the system log
-
al_s
OmniOS r151046 Version omnios-r151046-8ab991ea831 64-bit
-
al_s
Copyright (c) 2012-2017 OmniTI Computer Consulting, Inc.
-
al_s
Copyright (c) 2017-2024 OmniOS Community Edition (OmniOSce) Association.
-
al_s
panic[cpu0]/thread=fffffffffbca07a0: Failed to calibrate TSC
-
al_s
Warning - stack not written to the dump buffer
-
al_s
fffffffffbca6620 unix:tsc_calibrate+3a5 ()
-
al_s
fffffffffbca6640 unix:startup_tsc+1b ()
-
al_s
fffffffffbca6650 unix:startup+4a ()
-
al_s
fffffffffbca6690 genunix:main+36 ()
-
al_s
fffffffffbca66a0 unix:_locore_start+88 ()
-
andyf
That's a known problem. The easiest fix is to force use of a different time source. Let me look it up.
-
andyf
It's fixed in r151050, fwiw
-
andyf
echo set pit_is_broken = 1 > /etc/system.d/pit
-
andyf
and reboot
-
andyf
Oh, that's what Peter linked to, sorry
-
ptribble
Amother option is to stop and start the EC2 instance, so AWS brings it up on a different physical host which might have a better behaved environment
-
al_s
I have done that at least 5 times already today! Strangely the instance starts fine when changed to a t3 type. I have other issues then though, such as "Instance reachability check failed", despite ping and login working. I have yet to determine if that is an actual problem, but may trigger the instance status check alarm configured to reboot
-
al_s
it on hang. We will see.
-
al_s
Yep, it does.
-
al_s
Now I am running on a t3 instance, I am worried about
illumos.org/issues/16615 Do we know what is likely to provoke the panic?
-
fenix
→
BUG 16615: ena assertion failure in ena_tx_intr_work() (Closed) |
code.illumos.org/c/illumos-gate/+/3553
-
andyf
A lot of network traffic - enough so that upwards of 1024 packets are buffered waiting to be transmitted.
-
andyf
There is a hotfix available for r46 at
hf.omnios.org/r46/ena-16615.p5p (install with "pkg apply-hot-fix <url>")