04:42:07 <gitomat> [illumos-gate] 14320 loader.efi: Do not use as frame buffer BLT-only GOPs. -- Alexander Motin <mav⊙Fo>
08:57:58 <wiedi> jbk: I recently came across https://www.illumos.org/issues/14977 again and was wondering if we couldn't just do something like "#define NN_NUMBUF_SPACE_SZ (NN_NUMBUF_SZ + 1)" for places that use NN_UNIT_SPACE like dd?
08:58:00 <fenix> → BUG 14977: NN_NUMBUF_SZ is too small after 12258 (New) | https://code.illumos.org/c/illumos-gate/+/2354
10:22:26 <gitomat> [illumos-gate] 16477 lmrc: move MFI definitions into their own headers -- Hans Rosenfeld <rosenfeld⊙gho>
13:34:41 <jbk> wiedi: it actually gets messy unfortunately
13:35:13 <jbk> since the currently implementation always tries to make the value 'fit' the size of the buffer
13:36:10 <jbk> so adding an extra space in general causes stuff while not incorrect, is probably not what you want
13:37:03 <jbk> (the API probably needs to be redone a bit to better express the different use cases, which being 'private' isn't really a problem other than having the time to do it)
14:10:20 <wiedi> but if we use the bigger buffer only for the case where we have the space shouldn't that work? The first loop to find the suffix is independent of the buffer size and the second one includes the spc part.. so it should come out the same just with the addition space, no?
15:39:30 <jbk> i'm not too familiar with it, but would tcp fast retransmit be a possible reason for a system to send the same ACK 30+ times in a row?
16:02:57 <sommerfeld> jbk: something like that.   I believe TCP will send an immediate ack if it gets an out-of-order segment.  so if you get 1, 3, 4, 5, 6, ... 31, you would send dup acks for 1 and eventually the sender will see 3 dup acks for 1 and send 2.
16:03:14 <sommerfeld> (oversimplifying; tcp acks bytes not packets)
16:03:29 <sommerfeld> it's the receiver behavior that enables fast retransmit on the sender.
16:10:35 <sommerfeld> if SACK is enabled, the "duplicate" acks may have sack options that show that 3-N have been received.
16:11:19 <sommerfeld> how many dups you get would depend on how many packets are in flight
16:47:53 <jbk> this is the same one over 200x in a row (in one instance)
17:26:46 <sommerfeld> jbk: what's the RTT and the window size on the connection?
17:29:45 <jbk> hrm.. one thing that seems odd (at least at first glance).. from initiating side, RTT is about 0.5-0.6 ms, but the same connection on the destination is around 7.5ms
17:31:05 <jbk> the send window seems to stay around 2-4k then jumps up briefly to 400k every 15-20s
17:31:11 <sommerfeld> packet traces from both ends would be instructive.
17:32:06 <jbk> from some other testing, we're starting to wonder if maybe the receiving application is just now keeping up with reading()
17:32:09 <jbk> err not
17:33:01 <jbk> unfortunately, it's written in go which makes it challening to introspect
17:33:30 <gitomat> [illumos-gate] 16516 clone: smatch errors -- Toomas Soome <tsoome⊙mc>
17:33:32 <sommerfeld> are the acks absolutely dups or do they have sack blocks or window updates?
17:36:19 <sommerfeld> (if the receiver periodically reading large chunks you might see that behavior - send window is full until it isn't then the sender sends the next burst
17:37:14 <sommerfeld> tracing receiver syscalls might show the bursty read behavior.
17:38:16 <jbk> hrm...
17:39:02 <jbk> what i see is the socket is non-blocking, and it appears to be reading 32k at a time in a loop until it gets EAGAIN
17:39:31 <jbk> though in the middle of this, because you know OS threads are bad (/s), the actual lwp doing the reads is moving around
17:39:44 <jbk> which probably isn't causing a problem, but i'd imagine isn't helping either
17:39:59 <jbk> or at least isn't doing anyone any favors
17:41:39 <sommerfeld> treating os threads like virtual cpus while not worrying about cpu affinity.
17:45:22 <sommerfeld> so could it be: reads a burst until it gets EAGAIN (draining the receive queue) then it goes off and chews on it for a bit before trying to read any more?
17:45:43 <sommerfeld> and the sender refills the buffer and the receiver doesn't get around to reading for a bit?
17:46:49 <sommerfeld> or is it: link is noisy, packet gets dropped, and receiver's waiting for the sequence space hole to be filled in
17:47:06 <jbk> i suspect it's probably the former
17:47:22 <jbk> at least in syscalls, once it hits EAGAIN, it re-arms the fd (via port_associate), then lwp_park
17:49:44 <jbk> ooh.. truss -d might be helpful here (i've never actually used that, so had to see if it existed :P)
17:52:01 <jbk> it looks like it's spending about 3-4 ms to read all of the data and doesn't start reading until 2ms later
17:52:15 <jbk> though not sure how much probe effect truss would add
17:52:28 <jbk> might make more sense to switch to dtrace
17:59:19 <sommerfeld> so there may be some, uh, water hammer/pogo oscillation going on.
18:16:59 <jbk> what i'm wondering is if the TLS decryption is happening on the same goroutine as either the receive code or the bit that writes it out to a pipe.. i suspect ideally you'd want each bit running on it's own lwp (so executing in parallel) with large enough buffers that which ever one is the slowest never has to wait for more work
18:32:49 <jclulow> jbk: I gather that goroutines are, these days, preemptively multitasked -- but I also wonder if that support works or even exists on systems other than, say, Linux
18:33:18 <jclulow> Because if it doesn't, I can imagine some serious queue blocking latency bubbles that just don't appear on systems for which the preemption works
18:46:40 <jbk> well you know linux is the only OS that exists... (/s obviously)
18:48:52 <jbk> (one of my major annoyances with go is that is really was made to only run on linux, and there's a fairly noticable impedance mismatch using it on any other plaform.. even if you ignore illumos)
19:07:48 <jclulow> All I can offer are other languages and toolchains haha
19:14:19 <jbk> haha
19:40:38 <nomad> "The nfssec.conf file should not be edited by a user."
19:40:45 <nomad> then who should be editing it? 
19:40:55 * nomad loves enigmatic manpages.
19:46:36 <richlowe> kclient
19:46:42 <richlowe> which is undocumented, because all of kerberos is bad
19:47:01 <richlowe> oh, no, it does say so in kclient(8)!
19:50:29 <sommerfeld> never mind that a comment in nfssec.conf tells you to edit it.
19:51:44 <nomad> richlowe, I eventually found that manpage online. I don't have it on my host because I haven't needed to use kclient. We use AD and I've never needed kclient on the fileservers.
19:52:35 <nomad> sommerfeld, yeah. I edited it but I'm still being told sys=krb5 is invalid when I try to set it in sharenfs. I presume kclient does something more than just editing that file.
19:52:50 <richlowe> sommerfeld: yeah, the comment is telling you to do what kclient does, for what kclient does
19:54:04 <nomad> I tried restarting nfs/server, still told it is invalid. I wonder what else I need to tickle.
19:54:37 <nomad> and if I care enough to actually try to find out. (I really *should* care, given NFS's delightful security, but... <sigh>)
19:55:32 <jclulow> Is there a way to prevent a process from creating new contracts
19:56:29 <jclulow> Like, if you want to make sure all of the processes that are in contract A have any children they create also contained within contract A (not some new contract B) so that you can just torch the whole lot at once
19:56:55 <jbk> maybe a resource control? (haven't looked at what exists though)
19:57:00 <jbk> only thing i could think of offand
19:57:20 <jclulow> There are a number of contract privileges but there doesn't seem to be one that covers this per se
19:57:26 <jclulow> Probably should just add one
19:57:32 <jbk> hrm.. that's at the project level, so probably not quite useful for this
19:59:01 <jclulow> Should probably also look at adding the "proc_self" privilege too, to be able to create child processes that can't dork with other child processes via /proc