-
jbk
nice.. hopefully some illumos people can make it
-
jbk
hrm... this is a potentially interesting state on this machine
-
jbk
it appears to have deadlocked while starting up the cpus...
-
jbk
I think while issuing the cross call to start a cpu
-
jbk
also, ::cpustack on a particular CPU is failing with EINVAL which seems odd
-
rzezeski
The place this ipv6 fastpath/smartos issue has taken me are fascinating
-
rzezeski
-
rzezeski
(that's not the problem, just gave me a laugh)
-
rzezeski
Given that token ring lost, and we have no way to test it even if it didn't, I imagine we could remove any vestiges of support. I don't even know how TR works or if we even have drivers; and linux pulled their support for it in 2012.
-
jbk
the key thing is to make sure it doesn't fall off the end of the wire :)
-
jbk
it being the token
-
jbk
in theory it's supposed to behave better in that you don't really get collisions with token ring, but I believe in the end between switches becoming a thing (vs. hubs) and moar bandwith, those advantages lost out vs. the complexity (plus wasn't it an IBM thing meaning they charged more for it?)
-
rzezeski
From what wikipedia says it lost to fast/gig ethernet.
-
rzezeski
-
rzezeski
I'd say that's squarely in the dead column.
-
sommerfeld
jbk: some aspects of various token rings had some advantage over original CSMA/CD broadcast 10MB/s ethernet. But ethernet switching (now ubiquitous) firmly beats relaying packets through half the nodes on the network on average..
-
sommerfeld
(there were many token rings. Proteon had one. Apollo had another (with 4K+headers with header/data split which made zero-copy page-flipping efficient). IBM's was big for a while. FDDI finally got clobbered by gig ethernet.
-
jbk
ahh fddi.. i remember briefly dealing with that back in the early 00's
-
jbk
sprint was using it for their backup network at the time
-
jbk
so the latest bits on this HPE server is making me wonder if we're deadlocking the APICs somehow...
-
jbk
i still feel like i'm fumbling a bit in the dark, but at least now i think i'm maybe seeing the outlines of some things :)
-
» nomad hands jbk an Emulex trade-show swag flashlight
-
jbk
i'd rather have a patch that fixes this :)
-
jbk
does ::switch in kmdb take the cpuid, or the address of a cpu_t ?
-
danmcd
I think the addr. ::help text seems to indicate that.
-
danmcd
`::walk cpu` *should* get you appropriate addresses.
-
danmcd
Ahhh so does `::cpuinfo` , @jbk
-
jbk
yeah, didn't help.. still get Errno 22 trying to get at those CPUs registers...
-
danmcd
Ouch ouch ouch.
-
danmcd
I don't suppose use of `::stacks` can tell you if something in a CPU-initialization function?
-
jbk
i've been trying to gather any data that seems potentially interesting in the ticket (not sure if you've seen it), though my unfamiliarity with the interrupt stuff at this low level means I'm not always sure what is or is not interesting
-
jbk
i see one cpu ONPROC stuck in disp_lock_exit
-
richlowe
"stuck" how?
-
richlowe
disp locks could cause you trouble
-
jbk
it's been sitting there for 1+ hour
-
jbk
in a call to lock_clear_splx
-
jbk
at least doing a ::findstack on the value of that CPU's cpu_thread value
-
jbk
(which I don't know if that's interesting, or if there might be something else more useful)
-
jbk
this is from kmdb
-
jbk
one thing I'm wondering is that writing to an apic register in apic mode is apparently a serializing instruction, in x2apic mode it is explicitly not
-
jbk
and so now are we maybe tacitly been relying on the apic behavior and have just gotten lucky?
-
jbk
i mean, people have been using x2apic mode for a while, so i kinda discount that... but also given this is a dual CPU (64 core/ea) system... maaaaaaaaybe (a very big maybe) we haven't done much with x2apics and multi-cpus and that could explain why it's worked in the past???
-
richlowe
jbk: disp locks can really ruin your day here
-
jbk
(and all of this code is unmodified from upstream)
-
richlowe
I can't remember how to tell post-morten if you have taken a disp lock regularly, or while high
-
jbk
i guess as a bit of a shot in the dark i can try inserting mfence; lfence before any x2apic msr writes just as a test...
-
jbk
we do it for cross calls (at least atomic_or_ulong() is used for that... i'm assuming that's ok.. but it's also using the address on the stack)
-
jbk
(x2apic_send_ipi)
-
jbk
but not the other writes
-
jbk
richlowe: we've not made any changes here from stock illumos-gate...
-
jbk
and i've never dealt with disp locks before, so that'd be a whole new thing to dig into...
-
danmcd
I have seen the ticket updates @jbk.
-
» danmcd wonders if X2APIC stuff like this is yet-another-reason why Oxide built their own HW ?
-
richlowe
you would have to build way more of your own hardware than they did to avoid this
-
danmcd
Oh damn.
-
jbk
yeah, it's built into the CPU itself IIUC
-
jbk
according to intel's docs, there's only a small number of differences between the apic and x2apic
-
jbk
the big one being using msrs instead of MMIO as well as supporting 32-bit apic ids
-
jbk
instead of 8-bit
-
jbk
(so there's one register access that's a bit different between the two because apic you need to shift & mask to get the id
-
jbk
there's also one bit in a register that's no longer relevant
-
jbk
so it _shouldn't_ work that different from the apic code
-
jbk
(I guess I should say each core gets it's own apic)
-
sommerfeld
each core or each thread? (haven't looked at apics since before hyperthreading...)
-
jbk
hrm.. the intel docs aren't very clear on that.. it could be every thread
-
jbk
but either way, it's not a discrete component from the cpu package
-
jbk
I don't have access the the PCI or PCIe spec -- but in the apic manual, it describes the MSI message as having an 8-bit destination id... for apic ids < 256, these values match
-
jbk
for the apic values > 256, it seems to be truncating the value...(and maybe relying on the IO-APIC???)
-
richlowe
this now sounds familiar to me in another context
-
richlowe
and I think you might be absolutely screwed
-
richlowe
gimme a sec
-
jbk
when I set ddidebug to 0x401, it hangs pretty early starting the CPUs, with the added lfence and sfence instructions to apix_regops.c, it now gets back to where it was at least with mlxcx
-
jbk
it appears to be using a 32-bit 'address' in the msi-x table for the device..
-
jbk
but i feel like i'm maybe missing a step in here somewhere
-
gitomat
[illumos-gate] 15972 ZFS getattr could avoid many kidmap_cache_lookup_X calls -- Gordon Ross <gwr⊙rc>
-
gitomat
[illumos-gate] 17677 SMB server on ZFS can avoid many kidmap_getXbyY calls -- Gordon Ross <gwr⊙rc>
-
gitomat
[illumos-gate] 17678 zfs could avoid kidmap when there are no subgroups -- Matt Barden <mbarden⊙rc>
-
jbk
could it be related to interrupt remapping/virtualization?
-
jbk
do we need to maybe limit interrupts for a device to cores within a single physical CPU
-
jbk
?
-
jbk
as a side note, I'd really like us to prefer using C99-style initialization for the 'ops' structs (where we collect a bunch of function pointers that are meant to be overridden)
-
jbk
would make it a lot easier to see where they're getting assigned (or at least a lot less of a pain)
-
jbk
since we seem to override specific ones..
-
sommerfeld
stackoverflow.com/questions/6702814…s-there-a-need-to-poll-the-chosen-m has text which suggests that there is something in the iommu which will map the 8-bit cpu id in MSI into a 32-bit x2apic processor id.
-
jclulow
rzezeski: I give you permission to sunset token ring :P
-
rzezeski
jclulow: thank you, it's on my list
-
richlowe
sommerfeld: that matches what I'm trying to remember too, but the people involved aren't available at the moment
-
jbk
hrm.. the intel vt for directed i/o document does suggest that in x2apic mode, the upper 24 bits of the apic id go in the upper 40 bits of the address register...
-
jbk
worth a shot, though it does lead to 'is the support of this on the device tied to any specific capabilities?'
-
jbk
(i.e. is it possible to have devices that don't support all 64-bits of the msi address register?)