16:24:47 <jbk> with using -z assert-deflib and -z fatal-warnings with ld, is the position of the arguments significant?
16:34:10 <jbk> linking this library (3rd party) to link against an illumos proto area, it's using -L$PROTO/lib -L$PROTO/usr/lib before -l<lib> and has -z assert-deflib -zwarnings (but those are after), but it's still bailing
16:36:41 <jbk> it's linking against the proto libuuid, but it seems like when it's going through all the transitive dependencies, it's looking at the build system
16:38:32 <jbk> which is then tripping because (bug) the release of omnios libidspace accidentially got labeled with an ILLUMOS_1.0 version
16:40:32 <jbk> but it seems like it's maybe missing something to tell it 'don't look in /lib:/usr/lib'
18:40:05 <andyf> Is that somehow because proto doesn't include an ILLUMOS_1.0 libidspace?
18:40:18 <andyf> (and I still can't unravel the git history around how that happened)
18:58:14 <jbk> yeah, there's two issues.. the first is (AFAICT) it shouldn't be looking at /lib or /usr/lib, but because of that, it's picking up the ILLUMOS_1.0 in libidspace which shouldn't be there
18:59:23 <jbk> trying to pass '-Y P,<PROTO:LIB:LIST>' doesn't seem to help because gcc is first passing '-Y P,lib:/usr/lib' and ld just uses that
18:59:32 <jbk> why I was wondering about the position
18:59:49 <jbk> because it seems like the -z assert-deflib should be causing an error here
19:00:09 <jbk> but it's not (though maybe that check happens after? so this failure is happening before that can happen?)
19:00:56 <sommerfeld> if you're getting to ld via gcc, does gcc's -nodefaultlibs help?
19:04:38 <sommerfeld> (two seconds of experimentation with gcc -v -nodefaultlibs suggests it might)
19:09:02 <sommerfeld> ok, random illumos developer survey: my main build machine is an old Xeon E5630 with 4 cpus (8 hyperthreads) and it's not enough.  But I don't want to go too big.   Anyone have a good sense of a sweet spot for build machine scaling these days?
19:11:04 <jbk> andyf: as a quick an dirty hack, i was looking to see if there's a way to remove that version using elfedit
19:14:33 <sommerfeld> (or, failing that, what size build machine are you happy with?)
19:22:21 <rmustacc> sommerfeld: What's the budget?
19:22:43 <rmustacc> And how much do you care about power?
19:23:07 <rmustacc> I think the machine shapes and sizes are going to be predominately based upon that.
19:26:25 <sommerfeld> somewhat flexible on budget.  looking at single-socket amd motherboards 
19:27:27 <rmustacc> I would probably go to an AM5 based system versus paying for an EPYC.
20:22:28 <tsoome> .O Brain 16 bit... something new for every day.
20:24:15 <jbk> hrm..(unrelated subject) any good strategies for looking at a tcp connection to tell if an alternate congestion alg might work better?
20:38:53 <sommerfeld> jbk: snarky answer is, "yes, an alternate congestion algorithm would work better".   Serious answer: there are tons of stats & parameters to look at (bandwidth-delay product, retransmission behavior, etc.,)   One tool I'm aware of for visualizing performance is xplot -- which lets you zoom into a time vs sequence space plot.
20:39:39 <sommerfeld> what sort of symptoms are you seeing?   
20:49:05 <sommerfeld> if you're going to look at an alternate CC algorithm I suggest looking at BBR (mainly because I used to work near the people who did it and followed some of the work).  it attempts to determine the bandwidth of your share of the bottleneck link on the path and then send at that average rate (with periodic tweaks up and down to attempt to sense increases/decreases in the bottleneck rate).
20:52:43 <jbk> trying to look at component in the chain... we have a long running zfs send -> network -> zfs recv process that seems like it's going slower than it should
20:53:11 <jbk> looking at the dmu_send thread (I can't recall the exact name off the top of my head), it's spending about 40s of every minute waiting to write
20:53:28 <jbk> so that suggests it's not a matter of being able to get the data off the disk fast enough (it's an all ssd pool as well)
20:53:51 <jbk> and on the recv side, it's never writing out enough that it has to sync more frequently than the default 5s for the txg to close
20:54:43 <jbk> so trying to look at all the bits inbetween (i'm sort of suspecting some sort of interaction with the process that is sending the zfs send stream over the network, but instrumenting that has proved challenging)
20:54:51 <jbk> at the same time in the lab, a similar setup is flying
20:54:54 <jbk> on slower disks
20:55:12 <jbk> but in the slow case, the two sides are on other ends of an approx 20mile fiber link
20:55:33 <jbk> but latency (at least using ping, i figure that's probably worst case being lowest priority, etc) is 0.4ms
20:55:42 <jbk> which doesn't seem excessive
20:56:02 <sommerfeld> what's the link bandwidth, and what do you have the TCP windows sized at?
20:56:19 <jbk> but thought it might be nice to look at the congestino as well just to see if that could be a factor
20:56:46 <sommerfeld> window size should be > bandwidth * RTT or else you can't fill the link.
20:57:00 <sommerfeld> what does the RTT look like while traffic is in flight?
20:58:11 <rzezeski> jbk: as a quick sanity check it's worth looking at `connstat -e -oall` as well as aggregating mibs with DTrace and see what you find
20:58:23 <rzezeski> if you suspect network
20:58:27 <jbk> yeah i was just remembering connstat
20:58:35 <jbk> i don't know why I always forget about it given how nice it is
20:59:23 <tsoome> I found this one quite interesting reading  https://ieeexplore.ieee.org/abstract/document/7796870  
20:59:44 <tsoome> but ofc it is a bit specific...
21:01:31 <jbk> it looks like rtt is 571 (µs I think?)
21:01:49 <sommerfeld> jbk: that would be consistent with 0.4ms ping RTT
21:02:26 <sommerfeld> I somehow had not seen connstat
21:02:37 <rzezeski> sommerfeld: it's "newer"
21:02:41 <jbk> hrm.. but even at 10gb, that'd suggest about 625,000 byte window (assuming the full bw)
21:03:01 <sommerfeld> what window size do you see?
21:04:10 <jbk> hrm.. 401096... seems like that at least is probably not helping things
21:04:41 <sommerfeld> (SWND,CWND,RWND colums for send, congestion, receive windows.  CWND is computed by the congestion control algorithm)
21:04:48 <sommerfeld> yeah, bump that up a bunch.
21:05:49 <sommerfeld> (.. and CWND constrains sends so the effective send window is IIRC the min of SWND and CWND)
21:07:04 <sommerfeld> RTO is the retransmission timeout (per the connstat man page, in ms not us)
21:08:29 <sommerfeld> one performance trap is if you're stuck with RTO-driven retransmissions instead of fast retransmissions triggered by dup acks.  extra painful if you stalled because the send window is full)
21:08:52 <jbk> ooof swnd 3888 cwnd 270335
21:10:01 <sommerfeld> what's the suna that goes with that ("The number of unacknowledged bytes outstanding at this instant").
21:10:37 <sommerfeld> oops, gotta run.
21:11:29 <sommerfeld> (that may be you have 3888 bytes available and x00000 in flight.   not necessarily cause for alarm)
21:13:03 <jbk> yeah, i see swnd jumping around
21:13:27 <jbk> and some rtt values around 5ms (so varying between 0.4-5ms)
21:17:04 <jbk> hrm.. retranssegs keeps jumping by 3-4 segments every few seconds... seems like that is maybe not good
21:21:36 <jbk> sommerfeld: so far, all we support i sunreno,newreno, and cubic for cc algs
21:21:50 <jbk> though i know there's some other ones out there (I can't recall offhand) that look interesting
21:22:58 <jbk> IIRC, fbsd (at least at one point) also went as far as to (as they described it) have a whole second tcp stack for some newer congestion bits (though it's unclear if that's anything inherent w/ the alg or more it was easier to do that on fbsd vs. fit it in with their existing stack)
21:37:31 <sommerfeld> do you have sack enabled?
21:37:51 <jbk> yes
22:20:05 <jclulow> I think cubic is basically always at least as good as sunreno
22:20:16 <jclulow> At least I've never seen it go any slower
22:20:48 <jclulow> I feel like it could/should probably be the default these days
22:21:34 <jclulow> There's also https://www.illumos.org/issues/15019 which I feel someone could actually do (not even that much) work on and get those defaults lifted a bit as well
22:21:35 <fenix> → BUG 15019: TCP and UDP default send_buf and recv_buf values should be increased (New)
22:22:42 <jclulow> Also there's a here-say suggestion in there that dynamic tuning of the watermarks doesn't result in the same win as some change to the constants in the source -- but sadly there was no concrete demonstration of that behaviour, or suggestions of why
22:22:54 <jclulow> Alas!
22:53:17 <sommerfeld> yeah, cubic is probably the best of what's available in illumos today
23:09:58 <sommerfeld> jbk: increase in RTT under load typically indicates a standing queue building up at the bottleneck link.   the key insight in BBR is to look at how the sending rate affects RTT, with a goal to slow down sends when the RTT starts ramping up to minimize the size of that queue.   
23:14:10 <sommerfeld> jbk: if you're seeing retransmissions that might be worth digging into why.  getting a tcpdump around a retransmission event (especially if you can get it on both ends) might be informative.  are packets getting lost?  is there packet reordering leading to spurious retransmits?
23:16:13 <sommerfeld> there could be mundane reasons for packet reordering on the send side depending on how the tx queue is picked on a multi-queue NIC.. if you hash on cpu to pick the tx queue (which is good for scalability as it reduces cache misses) and the scheduler moves your sending threads to a different cpu all the time..
23:17:32 <jbk> the sending process is written in go, so i wonder if that might also be a factor since it's runtime basically does its own scheduling of goroutines IIUC
23:18:06 <sommerfeld> yeah, that could cause the cpu doing the sends on the connection to bounce around a bit.
23:21:03 <sommerfeld> depending on how much cpu the sender actually needs, dinking with pbind or psrset might be informative.
23:26:55 <jbk> i'm not sure offhand how MAC is doing that.. it's i40e on both sides, but over a vnic (long story), so i'm not sure how that'd impact how mac picks which ring to use