-
jbk
it's probably something i need to dig into.. IIRC, vnics will take a HW ring (if available) from the underlying device, but not sure if it's just 1 tx and rx, or if it'll use multiple HW rings
-
rmustacc
jbk: It's more useful to think of it as a vnic will take one group from mac today.
-
rmustacc
For i40e it'll only be one tx as we use the pseudo-rings feature for tx where each ring becomes a group.
-
rmustacc
Ultimately we'll set the number of qpairs per vsi which turns into groups basically based on the number of interrupts we got.
-
EisNerd_
rmustacc: btw how is about support for mellanox/nvidia NICs, like CX6?
-
EisNerd_
rmustacc: just checked HCL, seems to be in, nice. So I may use a VF instead of a vmware vnic, when moving this VM over to HPC env
-
gitomat
[illumos-gate] 16476 lmrc panic on MFI passthru ioctl that sets I/O direction flags but has no buffer -- Hans Rosenfeld <rosenfeld⊙gho>
-
Meths
I've got an NMI-induced dump from a hung VM. Where do I go from here to find what was locked up and then hopefully why it locked up?
-
sommerfeld
Meths: mdb ::cpuinfo -v is one good starting point. are they idle because no threads are runnable or are they all busy with spinning threads that aren't making useful forward progress.
-
sommerfeld
look at stack traces of suspect threads, build a dependency graph of who's waiting for what
-
Meths
Will do, thanks
-
jbk
ok.. this seems odd... i got a packet capture from both sides, and there's a pattern of <send a few packets of data>, then the other side responds with a bunch of duplicate acks in a row, rinse, wash, repeat
-
jbk
like the source sends 40 duplicate ACKs in a row
-
jbk
i don't think this is the i40e duplicate packet bug (as i'd expect that to manifest on _every_ packet)
-
sommerfeld
jbk: that sounds like expected behavior when a packet is dropped. the receiver is sending dup acks when it gets in-sequence data as a hint to the sender to retransmit one packet to fill the hole.
-
sommerfeld
dig further into the forward sequence number vs acked sequence numbers (and sack blocks if they're present).
-
sommerfeld
one stat to look at on the reciever is whether any packets are being dropped for bad checksums or bad CRCs
-
sommerfeld
(to be thorough, look at ethernet CRCs at each switch/router along the path)
-
sommerfeld
dup acks could also be window updates ("i've read some data so you can send more")
-
jbk
i don't have access to anything but the end systems
-
sommerfeld
if the advertised window is consistently small, you may be receiver limited (if it's nibbling one "thing" at a time out of the dump stream and taking a while to process it).
-
richlowe
jbk: first step with network weirdness, get in touch with someone who can see the network
-
richlowe
(good luck)
-
sommerfeld
jbk: another thing to try is to put something like "mbuffer" in the path (it was originally built to smooth out dump/restore to/from tape but might be useful here)
-
rzezeski
If you have captures from both sides you can see if packets are failing to traverse the switch. Also, you'll want to look at the i40e kstats to see if it's deciding to drop any packets due to full buffers and such (e.g. "rx_discards").
-
sommerfeld
richlowe: I believe this is more of a systems-performance-weirdness thing where there isn't yet clear evidence of whether or not to blame the network.
-
rzezeski
jbk: maybe also check i40e "tx_errors" on both sides
-
jbk
(side note: a nice addition to kstat would be to have a way to just show which ones increment in an interval)
-
jbk
i see some i40e rings where at some point there haven't been available descriptors (on tx and rx)
-
jbk
but I can't tell how long ago that was
-
jbk
this connection has been running for several days
-
jbk
hrm.. this is interesting (and likely unfortnate).. when assembling a PDU, about 25% of the segments are arriving out of order
-
jbk
err no.. retransmitted (wireshark is a bit confusing here)
-
richlowe
jbk: that kstat RFE would be wonderful, a *stat -type ticking view
-
richlowe
that wasn't impossible to read
-
richlowe
(similarly, basically all of *stat are impossible to read on a big enough machine)
-
richlowe
(intrstat's smart linewrapping actually makes me _less_ happy reading)
-
jbk
sommerfeld: there's already a piece that buffers and then sends the data over a TLS connection...
-
jbk
i am going to ask to see if there's any known issue with packet loss across the connection
-
jbk
the window size is too small, though not sure how I can bump that on an existing connection, but might have to make it re-connect and resume the transfer
-
jbk
but not sure if that'd explain the rest of the behavior i saw in that capture
-
jbk
or if there might also be some loss happening
-
jbk
i've not looked enough at the fast retransmit and such to know if 19,000 acks for the same seq are 'normal' or not though...
-
sommerfeld
well, maybe if the rx window is full and the receiver isn't reading anything because it's busy doing some zfs op that takes 10 minutes to commit...
-
sommerfeld
jbk: bumping it on an existing connection would be tricky. Best way to do that would likely be to implement pr_setsockopt() (see pr_getsockopt() in libproc)
-
richlowe
they just get the agent to do it, right?
-
richlowe
nothing fancy
-
sommerfeld
yeah
-
sommerfeld
not as simple as poking a structure field with mdb -kw
-
josephholsten
/join #freebsd
-
josephholsten
ha!
-
josephholsten
Did I miss a perl flag day or something? I know it'd been a few months, but usr/src/cmd/perl/contrib/Sun/Solaris/BSM seems mad at me
-
josephholsten
which reminds me, would it be useful to collect flag day emails in the last few years and update
illumos.org/docs/developers/flagdays/?
-
jbk
check the version of perl and the one defined in your env file
-
richlowe
it is probably the change to calculate all that automatically of Marcel's
-
josephholsten
"did you make make sure this repo was updated according to Building illumos¶
-
josephholsten
sigh.
-
sommerfeld
yes, updating flagdays would be useful.
-
josephholsten
yay, first nearly useless changeset:
illumos/docs #92
-
josephholsten
huh, I've never noticed
illumos.org/rb/r before, looks inactive. Should the "Review Board" link on
illumos.org/docs/community/conduct be updated to something more gerritty?
-
sommerfeld
yeah, it was used before gerrit.
-
richlowe
So I have a question: nightly runs `make clobber`, and then runs what amounts to a secret second heavier clobber
-
richlowe
Is this just more historical 'make clobber' being very broken (it's less broken now), or is there deeper meaning?
-
sommerfeld
you're talking about the thing after the "Get back to a clean workspace" comment?
-
richlowe
Yes
-
richlowe
I filed #16482 and #16483 (fenix?)
-
fenix
BUG 16482: nightly should tell you when it's deleting your work (New)
-
fenix
-
richlowe
oh, it can't do two
-
richlowe
16483 (fenix?)
-
richlowe
I give up, it says "don't delete stuff in source control, you ninny"
-
richlowe
because while having a file that matches that pattern is very unlikely (it's been years and years!), when it does happen, it requires a _lot_ of work to find out why you're getting screwed
-
KungFuJesus
hmm, ran into this today:
pastebin.com/bN2RqALX
-
KungFuJesus
I think it _might_ be this issue:
illumos.org/issues/15024
-
fenix
→
BUG 15024: NFS can exhaust pool threads getting RPCSEC_GSS credentials (In Progress) |
code.illumos.org/c/illumos-gate/+/2402
-
KungFuJesus
the machine was hung, logins weren't working and the panic finally happened after pressing the power button to invoke a shutdown
-
KungFuJesus
I can probably extract the coredump if need be
-
andyf
fenix can you show us illumos 16483?
-
fenix
BUG 16483: nightly should refuse to delete source-controlled files (New)
-
fenix
-
sommerfeld
richlowe: I think it's just a backstop for broken "make clobber". Especially since "make clobber" runs before "bringover" (if nightly does that); any fixes to clobber wouldn't run until the next build.
-
sommerfeld
(but, in general, I think that ordering is correct -- if a subtree is deleted from the gate, you want to "make clobber" it before its makefiles go away)