#omnios

10:36

al_s

Hi. I am having issues with omnios in AWS. Running omnios-r151046be on t2.large instance. I have 2 instances, pretty much exactly the same. One has been turned off for a while and when started it is getting the clock speed completely wrong.
10:36

al_s

root@ip-10-130-1-149:/localhome/aslate-local# psrinfo -pv
10:36

al_s

The physical processor has 2 virtual processors (0-1)
10:36

al_s

x86 (GenuineIntel 406F1 family 6 model 79 step 1 clock 112 MHz)
10:36

al_s

Intel(r) Xeon(r) CPU E5-2686 v4 @ 2.30GHz
10:37

al_s

This is causing the machine to have very strange issues
10:37

al_s

root@ip-10-130-1-149:/localhome/aslate-local# time sleep 10
10:37

al_s

real 0m0.817s
10:37

al_s

user 0m0.017s
10:37

al_s

sys 0m0.046s
10:38

al_s

The other machine is fine and continues to be after rebooting. So seems to be an issue after stopping and restarting.
10:39

al_s

Any ideas?
11:30

ptribble

al_s: we occasionally see AWS giving us a very strange clock, see for example a discussion (and potential resolution) a little while ago
11:30

ptribble

log.omnios.org/illumos/2024-02-09#1707488213-736772
11:59

al_s

Thank you ptribble I am giving that a go now
12:05

al_s

Hmm, now the instance does not start at all, seeing this in the system log
12:06

al_s

OmniOS r151046 Version omnios-r151046-8ab991ea831 64-bit
12:06

al_s

Copyright (c) 2012-2017 OmniTI Computer Consulting, Inc.
12:06

al_s

Copyright (c) 2017-2024 OmniOS Community Edition (OmniOSce) Association.
12:06

al_s

panic[cpu0]/thread=fffffffffbca07a0: Failed to calibrate TSC
12:06

al_s

Warning - stack not written to the dump buffer
12:06

al_s

fffffffffbca6620 unix:tsc_calibrate+3a5 ()
12:06

al_s

fffffffffbca6640 unix:startup_tsc+1b ()
12:06

al_s

fffffffffbca6650 unix:startup+4a ()
12:06

al_s

fffffffffbca6690 genunix:main+36 ()
12:06

al_s

fffffffffbca66a0 unix:_locore_start+88 ()
12:06

andyf

That's a known problem. The easiest fix is to force use of a different time source. Let me look it up.
12:06

andyf

It's fixed in r151050, fwiw
12:07

andyf

echo set pit_is_broken = 1 > /etc/system.d/pit
12:07

andyf

and reboot
12:08

andyf

Oh, that's what Peter linked to, sorry
12:14

ptribble

Amother option is to stop and start the EC2 instance, so AWS brings it up on a different physical host which might have a better behaved environment
12:20

al_s

I have done that at least 5 times already today! Strangely the instance starts fine when changed to a t3 type. I have other issues then though, such as "Instance reachability check failed", despite ping and login working. I have yet to determine if that is an actual problem, but may trigger the instance status check alarm configured to reboot
12:20

al_s

it on hang. We will see.
12:21

al_s

Yep, it does.
12:54

al_s

Now I am running on a t3 instance, I am worried about illumos.org/issues/16615 Do we know what is likely to provoke the panic?
12:54

fenix

→ BUG 16615: ena assertion failure in ena_tx_intr_work() (Closed) | code.illumos.org/c/illumos-gate/+/3553
14:20

andyf

A lot of network traffic - enough so that upwards of 1024 packets are buffered waiting to be transmitted.
14:20

andyf

There is a hotfix available for r46 at hf.omnios.org/r46/ena-16615.p5p (install with "pkg apply-hot-fix <url>")

2 years ago

« 2 days earlier

a day later »

today »