01:43:41 <jbk> it's probably something i need to dig into.. IIRC, vnics will take a HW ring (if available) from the underlying device, but not sure if it's just 1 tx and rx, or if it'll use multiple HW rings
01:56:55 <rmustacc> jbk: It's more useful to think of it as a vnic will take one group from mac today.
01:57:25 <rmustacc> For i40e it'll only be one tx as we use the pseudo-rings feature for tx where each ring becomes a group.
02:00:08 <rmustacc> Ultimately we'll set the number of qpairs per vsi which turns into groups basically based on the number of interrupts we got.
08:48:17 <EisNerd_> rmustacc: btw how is about support for mellanox/nvidia NICs, like CX6?
08:50:47 <EisNerd_> rmustacc: just checked HCL, seems to be in, nice. So I may use a VF instead of a vmware vnic, when moving this VM over to HPC env
14:44:32 <gitomat> [illumos-gate] 16476 lmrc panic on MFI passthru ioctl that sets I/O direction flags but has no buffer -- Hans Rosenfeld <rosenfeld⊙gho>
15:17:40 <Meths> I've got an NMI-induced dump from a hung VM.  Where do I go from here to find what was locked up and then hopefully why it locked up?
15:35:46 <sommerfeld> Meths: mdb ::cpuinfo -v is one good starting point.   are they idle because no threads are runnable or are they all busy with spinning threads that aren't making useful forward progress.
15:38:24 <sommerfeld> look at stack traces of suspect threads, build a dependency graph of who's waiting for what
15:44:28 <Meths> Will do, thanks
17:01:20 <jbk> ok.. this seems odd... i got a packet capture from both sides, and there's a pattern of <send a few packets of data>, then the other side responds with a bunch of duplicate acks in a row, rinse, wash, repeat
17:07:30 <jbk> like the source sends 40 duplicate ACKs in a row
17:15:34 <jbk> i don't think this is the i40e duplicate packet bug (as i'd expect that to manifest on _every_ packet)
17:58:17 <sommerfeld> jbk: that sounds like expected behavior when a packet is dropped.  the receiver is sending dup acks when it gets in-sequence data as a hint to the sender to retransmit one packet to fill the hole.
18:00:32 <sommerfeld> dig further into the forward sequence number vs acked sequence numbers (and sack blocks if they're present).
18:01:42 <sommerfeld> one stat to look at on the reciever is whether any packets are being dropped for bad checksums or bad CRCs
18:02:33 <sommerfeld> (to be thorough, look at ethernet CRCs at each switch/router along the path)
18:03:24 <sommerfeld> dup acks could also be window updates ("i've read some data so you can send more")
18:07:58 <jbk> i don't have access to anything but the end systems
18:10:41 <sommerfeld> if the advertised window is consistently small, you may be receiver limited (if it's nibbling one "thing" at a time out of the dump stream and taking a while to process it).
18:18:36 <richlowe> jbk: first step with network weirdness, get in touch with someone who can see the network
18:18:44 <richlowe> (good luck)
18:24:07 <sommerfeld> jbk: another thing to try is to put something like "mbuffer" in the path (it was originally built to smooth out dump/restore to/from tape but might be useful here)
18:29:24 <rzezeski> If you have captures from both sides you can see if packets are failing to traverse the switch. Also, you'll want to look at the i40e kstats to see if it's deciding to drop any packets due to full buffers and such (e.g. "rx_discards"). 
18:33:22 <sommerfeld> richlowe: I believe this is more of a systems-performance-weirdness thing where there isn't yet clear evidence of whether or not to blame the network.
18:36:00 <rzezeski> jbk: maybe also check i40e "tx_errors" on both sides
18:36:28 <jbk> (side note: a nice addition to kstat would be to have a way to just show which ones increment in an interval)
18:36:47 <jbk> i see some i40e rings where at some point there haven't been available descriptors (on tx and rx)
18:36:53 <jbk> but I can't tell how long ago that was
18:37:03 <jbk> this connection has been running for several days
18:53:27 <jbk> hrm.. this is interesting (and likely unfortnate).. when assembling a PDU, about 25% of the segments are arriving out of order
18:54:09 <jbk> err no.. retransmitted (wireshark is a bit confusing here)
19:32:24 <richlowe> jbk: that kstat RFE would be wonderful, a *stat -type ticking view
19:32:32 <richlowe> that wasn't impossible to read
19:32:47 <richlowe> (similarly, basically all of *stat are impossible to read on a big enough machine)
19:33:25 <richlowe> (intrstat's smart linewrapping actually makes me _less_ happy reading)
19:46:55 <jbk> sommerfeld: there's already a piece that buffers and then sends the data over a TLS connection...
19:47:09 <jbk> i am going to ask to see if there's any known issue with packet loss across the connection
19:49:10 <jbk> the window size is too small, though not sure how I can bump that on an existing connection, but might have to make it re-connect and resume the transfer
19:49:25 <jbk> but not sure if that'd explain the rest of the behavior i saw in that capture
19:49:31 <jbk> or if there might also be some loss happening
19:49:58 <jbk> i've not looked enough at the fast retransmit and such to know if 19,000 acks for the same seq are 'normal' or not though...
20:09:19 <sommerfeld> well, maybe if the rx window is full and the receiver isn't reading anything because it's busy doing some zfs op that takes 10 minutes to commit...
20:17:53 <sommerfeld> jbk: bumping it on an existing connection would be tricky.  Best way to do that would likely be to implement pr_setsockopt() (see pr_getsockopt() in libproc)
20:22:05 <richlowe> they just get the agent to do it, right?
20:22:06 <richlowe> nothing fancy
20:28:05 <sommerfeld> yeah
20:34:21 <sommerfeld> not as simple as poking a structure field with mdb -kw 
21:10:53 <josephholsten>  /join #freebsd
21:11:07 <josephholsten> ha!
21:25:59 <josephholsten> Did I miss a perl flag day or something? I know it'd been a few months, but usr/src/cmd/perl/contrib/Sun/Solaris/BSM seems mad at me
21:27:02 <josephholsten> which reminds me, would it be useful to collect flag day emails in the last few years and update https://illumos.org/docs/developers/flagdays/?
21:27:53 <jbk> check the version of perl and the one defined in your env file
21:29:24 <richlowe> it is probably the change to calculate all that automatically of Marcel's
21:33:34 <josephholsten> "did you make make sure this repo was updated according to Building illumos¶
21:33:41 <josephholsten> sigh.
21:52:42 <sommerfeld> yes, updating flagdays would be useful.  
22:27:05 <josephholsten> yay, first nearly useless changeset: https://github.com/illumos/docs/pull/92
22:32:51 <josephholsten> huh, I've never noticed https://illumos.org/rb/r/ before, looks inactive. Should the "Review Board" link on https://illumos.org/docs/community/conduct/ be updated to something more gerritty?
22:41:11 <sommerfeld> yeah, it was used before gerrit.
22:52:10 <richlowe> So I have a question: nightly runs `make clobber`, and then runs what amounts to a secret second heavier clobber
22:53:18 <richlowe> Is this just more historical 'make clobber' being very broken (it's less broken now), or is there deeper meaning?
22:54:51 <sommerfeld> you're talking about the thing after the "Get back to a clean workspace" comment?
22:57:49 <richlowe> Yes
22:57:56 <richlowe> I filed #16482 and #16483 (fenix?)
22:57:57 <fenix> BUG 16482: nightly should tell you when it's deleting your work (New)
22:57:57 <fenix> ↳ https://www.illumos.org/issues/16482
22:58:04 <richlowe> oh, it can't do two
22:58:07 <richlowe> 16483 (fenix?)
22:58:18 <richlowe> I give up, it says "don't delete stuff in source control, you ninny"
22:59:33 <richlowe> because while having a file that matches that pattern is very unlikely (it's been years and years!), when it does happen, it requires a _lot_ of work to find out why you're getting screwed
23:10:45 <KungFuJesus> hmm, ran into this today: https://pastebin.com/bN2RqALX
23:11:49 <KungFuJesus> I think it _might_ be this issue: https://www.illumos.org/issues/15024
23:11:50 <fenix> → BUG 15024: NFS can exhaust pool threads getting RPCSEC_GSS credentials (In Progress) | https://code.illumos.org/c/illumos-gate/+/2402
23:12:09 <KungFuJesus> the machine was hung, logins weren't working and the panic finally happened after pressing the power button to invoke a shutdown
23:12:35 <KungFuJesus> I can probably extract the coredump if need be
23:36:18 <andyf> fenix can you show us illumos 16483?
23:36:19 <fenix> BUG 16483: nightly should refuse to delete source-controlled files (New)
23:36:19 <fenix> ↳ https://www.illumos.org/issues/16483
23:56:18 <sommerfeld> richlowe: I think it's just a backstop for broken "make clobber".  Especially since "make clobber" runs before "bringover" (if nightly does that); any fixes to clobber wouldn't run until the next build.
23:57:46 <sommerfeld> (but, in general, I think that ordering is correct -- if a subtree is deleted from the gate, you want to "make clobber" it before its makefiles go away)