02:34:50 <gitomat> [illumos-gate] 16137 Update tzdata to 2023d -- Robert Mustacchi <rm⊙fo>
19:21:29 <tsoome> introducing qemu to kernel did not eliminate the hung on multi-core VM startup, so I'm rather out of ideas. The only thing I'm sure of is that it does happen while we are attempting to bring non-boot cpus online:)
19:36:16 <richlowe> the only stack I've seen from you, we have an interrupt pinning the pause thread
19:57:59 <tsoome> oh, that I can do.
20:04:55 <tsoome> here it is: https://paste.ec/paste/D-q7OOCq#Zj1c60Guk19jmOXJvwcf1xeNmee6j9b9tPbUJLxrQAX
20:08:40 <richlowe> huh
20:08:48 <richlowe> the warnings are worrying, too
20:09:34 <richlowe> apparently we don't know how to save fpu state
20:10:33 <richlowe> and we got preempted starting that cpu?
20:11:39 <richlowe> does everything stay like that?
20:12:10 <richlowe> and does this look like the hang in qemu 6 thing I forgot the details of? :)
20:13:30 <sommerfeld> but it works some of the time?
20:14:07 <richlowe> I don't think the fpu state warnings would break us yet(?), but from what I remember of rmustacc's xsave stuff, it's not good
20:15:24 <sommerfeld> also the: WARNING: kmem_alloc(): sleeping allocation with size of 0; see kmem_zerosized_log for details
20:17:05 <sommerfeld> I'd track that one down and figure out why it's happening.   at the very least it would get the noise out of the boot log but it might indicate something is mis-sized.
20:18:50 <sommerfeld> what do we know about the multi-core startup?   I assume you're trying 2 cpus - is cpu 0 hanging waiting for cpu 1 to show signs of life?
20:19:56 <sommerfeld> the one time I had to debug something like this I ended embedding a bunch of "store constant to global variable" through early cpu startup code to see evidence that cpu 1 was doing stuff.
20:20:14 <jbk> what system is this? the broken cpuid bits seem strange... I thought some things will key off of data returned from them, but not sure if any of that would be relevant to this
20:26:29 <richlowe> sommerfeld: i admit I assumed, but I think that's probably the save area we got told has a size of 0
20:26:53 <richlowe> jbk: could you re-up the err format review? :)
20:42:11 <jbk> done
21:28:16 <tsoome> yes, it does work sometimes. I actually did try to use some intel cpus there (to eliminate those amd related warnings), but that did not really change the behavior.
21:40:35 <tsoome> sommerfeld actual setup is 4 cores, but 2 will hung just as well. at this point, we actually do have 2 cpus detected and somewhat configured. 
21:50:41 <tsoome> well, apparently it is not just completely hung, thread on cpu0 is processing interrupts - it does look like it is rather busy doing so and maybe at some point of time it will get back to onlining cpu1:D
21:53:01 <sommerfeld> maybe a stuck/not properly acked interrupt?
21:54:21 <tsoome> ::interrupts https://paste.ec/paste/wBcW-IlD#7sR-6+NpmBOMj9wPjHZRBbcGitRYgFB0sWf+u2+56fv
22:02:13 <tsoome> cpu1 has thread fffffe000fca3c20, but it has nothing in stack.  reported by ::stacks is 'fffffe000fca3c20 ONPROC   <NONE>                  2' and '<no consistent stack found>'
22:10:19 <sommerfeld> either stuck spinning, or not actually running anything yet...
22:14:05 <tsoome> ok, got the RIP of CPU1 from qemu info regs, it is sitting in fffffffffb8a4f68 is pause_cpus+0x128, it seems to be https://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/os/cpu.c?r=3df2e8b2&mo=54021&fi=1968#1069
22:14:54 <tsoome> pause_cpus+0x128:               movzbl 0xfffffffffbc59640(%rdx),%eax    <safe_list>
22:18:01 <sommerfeld> so it's spinning, waiting for the other cpu(s) to pause..
22:18:52 <tsoome> yep
22:20:23 <tsoome> RAX=0000000000000001  so it is != PAUSE_WAIT (2)
22:21:56 <tsoome> [0]> safe_list::print                 
22:21:56 <tsoome> [ '\001', '\002', '\002', '\002', '\002', '\002', '\002', '\002', '\002', '\002', '\002', '\002', '\002', '\002
22:21:56 <tsoome> ', '\002', '\002', '\002', '\002', '\002', '\002', '\002', '\002', '\002', '\002', '\002', '\002', '\002', '\00
22:21:56 <tsoome> 2', '\002', '\002', '\002', '\002', ... ]
22:22:29 <sommerfeld> and cpu0 is not cooperating.
22:22:30 <tsoome> so the cpu0 does not want to move from PAUSE_READY to PAUSE_WAIT
22:22:33 <tsoome> yep
22:23:17 <tsoome> at least the mechanism is understandable now:D
22:25:30 <tsoome> this is what cpu0 is doing at that moment: https://paste.ec/paste/nSdvfoOH#xjkzzC9TQYx8WV8at885QZnjKAt4JYrGdUTHV-BXEkW
22:28:06 <tsoome> so we got _interrupt while running cpu_pause 
22:32:47 <sommerfeld> interrupted in the: 	while (cpi->cp_go == 0) ; ?
22:33:00 <sommerfeld> before it could splhigh() ?
22:33:49 <tsoome> it seems, in sema_v()  just a sec, getting paste...
22:34:06 <tsoome> https://paste.ec/paste/B31p+UIt#rdJzPGjfPvUDLrLJtgmTDjUQZpp2wh2JBoZmo8cqSDA
22:34:42 <tsoome> cpu_pause+0x58, but yes, before spllhigh()
22:35:16 <tsoome> seems like race case
22:37:44 <tsoome> in cpu_pause_info we have cp_go = 0x1
22:41:49 <sommerfeld> so perhaps looping in the interrupt handler on cpu0, not able to reach the test on cp_go ..
22:44:48 <tsoome> "...we don't want to spl because that might block clock interrupts needed to preempt threads on other CPUs." -- but there are no other cpus yet.
22:45:09 <tsoome> so thats the catch there.
22:59:40 <sommerfeld> possibly looping in tsc_gethrtime() ?
23:04:13 <tsoome> no, I mean, we only do have 2 cpus set up yet, we are about to online cpu1, but we need to quiesce cpu0 first, but there is small window between rising semafore and going splhigh() during which we are actually interrupted ...
23:06:07 <tsoome> um, well, to put it other way around -- if all other cpus are paused, we do not need to keep that window any more