-
jbk
with using -z assert-deflib and -z fatal-warnings with ld, is the position of the arguments significant?
-
jbk
linking this library (3rd party) to link against an illumos proto area, it's using -L$PROTO/lib -L$PROTO/usr/lib before -l<lib> and has -z assert-deflib -zwarnings (but those are after), but it's still bailing
-
jbk
it's linking against the proto libuuid, but it seems like when it's going through all the transitive dependencies, it's looking at the build system
-
jbk
which is then tripping because (bug) the release of omnios libidspace accidentially got labeled with an ILLUMOS_1.0 version
-
jbk
but it seems like it's maybe missing something to tell it 'don't look in /lib:/usr/lib'
-
andyf
Is that somehow because proto doesn't include an ILLUMOS_1.0 libidspace?
-
andyf
(and I still can't unravel the git history around how that happened)
-
jbk
yeah, there's two issues.. the first is (AFAICT) it shouldn't be looking at /lib or /usr/lib, but because of that, it's picking up the ILLUMOS_1.0 in libidspace which shouldn't be there
-
jbk
trying to pass '-Y P,<PROTO:LIB:LIST>' doesn't seem to help because gcc is first passing '-Y P,lib:/usr/lib' and ld just uses that
-
jbk
why I was wondering about the position
-
jbk
because it seems like the -z assert-deflib should be causing an error here
-
jbk
but it's not (though maybe that check happens after? so this failure is happening before that can happen?)
-
sommerfeld
if you're getting to ld via gcc, does gcc's -nodefaultlibs help?
-
sommerfeld
(two seconds of experimentation with gcc -v -nodefaultlibs suggests it might)
-
sommerfeld
ok, random illumos developer survey: my main build machine is an old Xeon E5630 with 4 cpus (8 hyperthreads) and it's not enough. But I don't want to go too big. Anyone have a good sense of a sweet spot for build machine scaling these days?
-
jbk
andyf: as a quick an dirty hack, i was looking to see if there's a way to remove that version using elfedit
-
sommerfeld
(or, failing that, what size build machine are you happy with?)
-
rmustacc
sommerfeld: What's the budget?
-
rmustacc
And how much do you care about power?
-
rmustacc
I think the machine shapes and sizes are going to be predominately based upon that.
-
sommerfeld
somewhat flexible on budget. looking at single-socket amd motherboards
-
rmustacc
I would probably go to an AM5 based system versus paying for an EPYC.
-
tsoome
.O Brain 16 bit... something new for every day.
-
jbk
hrm..(unrelated subject) any good strategies for looking at a tcp connection to tell if an alternate congestion alg might work better?
-
sommerfeld
jbk: snarky answer is, "yes, an alternate congestion algorithm would work better". Serious answer: there are tons of stats & parameters to look at (bandwidth-delay product, retransmission behavior, etc.,) One tool I'm aware of for visualizing performance is xplot -- which lets you zoom into a time vs sequence space plot.
-
sommerfeld
what sort of symptoms are you seeing?
-
sommerfeld
if you're going to look at an alternate CC algorithm I suggest looking at BBR (mainly because I used to work near the people who did it and followed some of the work). it attempts to determine the bandwidth of your share of the bottleneck link on the path and then send at that average rate (with periodic tweaks up and down to attempt to sense increases/decreases in the bottleneck rate).
-
jbk
trying to look at component in the chain... we have a long running zfs send -> network -> zfs recv process that seems like it's going slower than it should
-
jbk
looking at the dmu_send thread (I can't recall the exact name off the top of my head), it's spending about 40s of every minute waiting to write
-
jbk
so that suggests it's not a matter of being able to get the data off the disk fast enough (it's an all ssd pool as well)
-
jbk
and on the recv side, it's never writing out enough that it has to sync more frequently than the default 5s for the txg to close
-
jbk
so trying to look at all the bits inbetween (i'm sort of suspecting some sort of interaction with the process that is sending the zfs send stream over the network, but instrumenting that has proved challenging)
-
jbk
at the same time in the lab, a similar setup is flying
-
jbk
on slower disks
-
jbk
but in the slow case, the two sides are on other ends of an approx 20mile fiber link
-
jbk
but latency (at least using ping, i figure that's probably worst case being lowest priority, etc) is 0.4ms
-
jbk
which doesn't seem excessive
-
sommerfeld
what's the link bandwidth, and what do you have the TCP windows sized at?
-
jbk
but thought it might be nice to look at the congestino as well just to see if that could be a factor
-
sommerfeld
window size should be > bandwidth * RTT or else you can't fill the link.
-
sommerfeld
what does the RTT look like while traffic is in flight?
-
rzezeski
jbk: as a quick sanity check it's worth looking at `connstat -e -oall` as well as aggregating mibs with DTrace and see what you find
-
rzezeski
if you suspect network
-
jbk
yeah i was just remembering connstat
-
jbk
i don't know why I always forget about it given how nice it is
-
tsoome
-
tsoome
but ofc it is a bit specific...
-
jbk
it looks like rtt is 571 (µs I think?)
-
sommerfeld
jbk: that would be consistent with 0.4ms ping RTT
-
sommerfeld
I somehow had not seen connstat
-
rzezeski
sommerfeld: it's "newer"
-
jbk
hrm.. but even at 10gb, that'd suggest about 625,000 byte window (assuming the full bw)
-
sommerfeld
what window size do you see?
-
jbk
hrm.. 401096... seems like that at least is probably not helping things
-
sommerfeld
(SWND,CWND,RWND colums for send, congestion, receive windows. CWND is computed by the congestion control algorithm)
-
sommerfeld
yeah, bump that up a bunch.
-
sommerfeld
(.. and CWND constrains sends so the effective send window is IIRC the min of SWND and CWND)
-
sommerfeld
RTO is the retransmission timeout (per the connstat man page, in ms not us)
-
sommerfeld
one performance trap is if you're stuck with RTO-driven retransmissions instead of fast retransmissions triggered by dup acks. extra painful if you stalled because the send window is full)
-
jbk
ooof swnd 3888 cwnd 270335
-
sommerfeld
what's the suna that goes with that ("The number of unacknowledged bytes outstanding at this instant").
-
sommerfeld
oops, gotta run.
-
sommerfeld
(that may be you have 3888 bytes available and x00000 in flight. not necessarily cause for alarm)
-
jbk
yeah, i see swnd jumping around
-
jbk
and some rtt values around 5ms (so varying between 0.4-5ms)
-
jbk
hrm.. retranssegs keeps jumping by 3-4 segments every few seconds... seems like that is maybe not good
-
jbk
sommerfeld: so far, all we support i sunreno,newreno, and cubic for cc algs
-
jbk
though i know there's some other ones out there (I can't recall offhand) that look interesting
-
jbk
IIRC, fbsd (at least at one point) also went as far as to (as they described it) have a whole second tcp stack for some newer congestion bits (though it's unclear if that's anything inherent w/ the alg or more it was easier to do that on fbsd vs. fit it in with their existing stack)
-
sommerfeld
do you have sack enabled?
-
jbk
yes
-
jclulow
I think cubic is basically always at least as good as sunreno
-
jclulow
At least I've never seen it go any slower
-
jclulow
I feel like it could/should probably be the default these days
-
jclulow
There's also
illumos.org/issues/15019 which I feel someone could actually do (not even that much) work on and get those defaults lifted a bit as well
-
fenix
→
BUG 15019: TCP and UDP default send_buf and recv_buf values should be increased (New)
-
jclulow
Also there's a here-say suggestion in there that dynamic tuning of the watermarks doesn't result in the same win as some change to the constants in the source -- but sadly there was no concrete demonstration of that behaviour, or suggestions of why
-
jclulow
Alas!
-
sommerfeld
yeah, cubic is probably the best of what's available in illumos today
-
sommerfeld
jbk: increase in RTT under load typically indicates a standing queue building up at the bottleneck link. the key insight in BBR is to look at how the sending rate affects RTT, with a goal to slow down sends when the RTT starts ramping up to minimize the size of that queue.
-
sommerfeld
jbk: if you're seeing retransmissions that might be worth digging into why. getting a tcpdump around a retransmission event (especially if you can get it on both ends) might be informative. are packets getting lost? is there packet reordering leading to spurious retransmits?
-
sommerfeld
there could be mundane reasons for packet reordering on the send side depending on how the tx queue is picked on a multi-queue NIC.. if you hash on cpu to pick the tx queue (which is good for scalability as it reduces cache misses) and the scheduler moves your sending threads to a different cpu all the time..
-
jbk
the sending process is written in go, so i wonder if that might also be a factor since it's runtime basically does its own scheduling of goroutines IIUC
-
sommerfeld
yeah, that could cause the cpu doing the sends on the connection to bounce around a bit.
-
sommerfeld
depending on how much cpu the sender actually needs, dinking with pbind or psrset might be informative.
-
jbk
i'm not sure offhand how MAC is doing that.. it's i40e on both sides, but over a vnic (long story), so i'm not sure how that'd impact how mac picks which ring to use