#illumos

16:24

jbk

with using -z assert-deflib and -z fatal-warnings with ld, is the position of the arguments significant?
16:34

jbk

linking this library (3rd party) to link against an illumos proto area, it's using -L$PROTO/lib -L$PROTO/usr/lib before -l<lib> and has -z assert-deflib -zwarnings (but those are after), but it's still bailing
16:36

jbk

it's linking against the proto libuuid, but it seems like when it's going through all the transitive dependencies, it's looking at the build system
16:38

jbk

which is then tripping because (bug) the release of omnios libidspace accidentially got labeled with an ILLUMOS_1.0 version
16:40

jbk

but it seems like it's maybe missing something to tell it 'don't look in /lib:/usr/lib'
18:40

andyf

Is that somehow because proto doesn't include an ILLUMOS_1.0 libidspace?
18:40

andyf

(and I still can't unravel the git history around how that happened)
18:58

jbk

yeah, there's two issues.. the first is (AFAICT) it shouldn't be looking at /lib or /usr/lib, but because of that, it's picking up the ILLUMOS_1.0 in libidspace which shouldn't be there
18:59

jbk

trying to pass '-Y P,<PROTO:LIB:LIST>' doesn't seem to help because gcc is first passing '-Y P,lib:/usr/lib' and ld just uses that
18:59

jbk

why I was wondering about the position
18:59

jbk

because it seems like the -z assert-deflib should be causing an error here
19:00

jbk

but it's not (though maybe that check happens after? so this failure is happening before that can happen?)
19:00

sommerfeld

if you're getting to ld via gcc, does gcc's -nodefaultlibs help?
19:04

sommerfeld

(two seconds of experimentation with gcc -v -nodefaultlibs suggests it might)
19:09

sommerfeld

ok, random illumos developer survey: my main build machine is an old Xeon E5630 with 4 cpus (8 hyperthreads) and it's not enough. But I don't want to go too big. Anyone have a good sense of a sweet spot for build machine scaling these days?
19:11

jbk

andyf: as a quick an dirty hack, i was looking to see if there's a way to remove that version using elfedit
19:14

sommerfeld

(or, failing that, what size build machine are you happy with?)
19:22

rmustacc

sommerfeld: What's the budget?
19:22

rmustacc

And how much do you care about power?
19:23

rmustacc

I think the machine shapes and sizes are going to be predominately based upon that.
19:26

sommerfeld

somewhat flexible on budget. looking at single-socket amd motherboards
19:27

rmustacc

I would probably go to an AM5 based system versus paying for an EPYC.
20:22

tsoome

.O Brain 16 bit... something new for every day.
20:24

jbk

hrm..(unrelated subject) any good strategies for looking at a tcp connection to tell if an alternate congestion alg might work better?
20:38

sommerfeld

jbk: snarky answer is, "yes, an alternate congestion algorithm would work better". Serious answer: there are tons of stats & parameters to look at (bandwidth-delay product, retransmission behavior, etc.,) One tool I'm aware of for visualizing performance is xplot -- which lets you zoom into a time vs sequence space plot.
20:39

sommerfeld

what sort of symptoms are you seeing?
20:49

sommerfeld

if you're going to look at an alternate CC algorithm I suggest looking at BBR (mainly because I used to work near the people who did it and followed some of the work). it attempts to determine the bandwidth of your share of the bottleneck link on the path and then send at that average rate (with periodic tweaks up and down to attempt to sense increases/decreases in the bottleneck rate).
20:52

jbk

trying to look at component in the chain... we have a long running zfs send -> network -> zfs recv process that seems like it's going slower than it should
20:53

jbk

looking at the dmu_send thread (I can't recall the exact name off the top of my head), it's spending about 40s of every minute waiting to write
20:53

jbk

so that suggests it's not a matter of being able to get the data off the disk fast enough (it's an all ssd pool as well)
20:53

jbk

and on the recv side, it's never writing out enough that it has to sync more frequently than the default 5s for the txg to close
20:54

jbk

so trying to look at all the bits inbetween (i'm sort of suspecting some sort of interaction with the process that is sending the zfs send stream over the network, but instrumenting that has proved challenging)
20:54

jbk

at the same time in the lab, a similar setup is flying
20:54

jbk

on slower disks
20:55

jbk

but in the slow case, the two sides are on other ends of an approx 20mile fiber link
20:55

jbk

but latency (at least using ping, i figure that's probably worst case being lowest priority, etc) is 0.4ms
20:55

jbk

which doesn't seem excessive
20:56

sommerfeld

what's the link bandwidth, and what do you have the TCP windows sized at?
20:56

jbk

but thought it might be nice to look at the congestino as well just to see if that could be a factor
20:56

sommerfeld

window size should be > bandwidth * RTT or else you can't fill the link.
20:57

sommerfeld

what does the RTT look like while traffic is in flight?
20:58

rzezeski

jbk: as a quick sanity check it's worth looking at `connstat -e -oall` as well as aggregating mibs with DTrace and see what you find
20:58

rzezeski

if you suspect network
20:58

jbk

yeah i was just remembering connstat
20:58

jbk

i don't know why I always forget about it given how nice it is
20:59

tsoome

I found this one quite interesting reading ieeexplore.ieee.org/abstract/document/7796870
20:59

tsoome

but ofc it is a bit specific...
21:01

jbk

it looks like rtt is 571 (µs I think?)
21:01

sommerfeld

jbk: that would be consistent with 0.4ms ping RTT
21:02

sommerfeld

I somehow had not seen connstat
21:02

rzezeski

sommerfeld: it's "newer"
21:02

jbk

hrm.. but even at 10gb, that'd suggest about 625,000 byte window (assuming the full bw)
21:03

sommerfeld

what window size do you see?
21:04

jbk

hrm.. 401096... seems like that at least is probably not helping things
21:04

sommerfeld

(SWND,CWND,RWND colums for send, congestion, receive windows. CWND is computed by the congestion control algorithm)
21:04

sommerfeld

yeah, bump that up a bunch.
21:05

sommerfeld

(.. and CWND constrains sends so the effective send window is IIRC the min of SWND and CWND)
21:07

sommerfeld

RTO is the retransmission timeout (per the connstat man page, in ms not us)
21:08

sommerfeld

one performance trap is if you're stuck with RTO-driven retransmissions instead of fast retransmissions triggered by dup acks. extra painful if you stalled because the send window is full)
21:08

jbk

ooof swnd 3888 cwnd 270335
21:10

sommerfeld

what's the suna that goes with that ("The number of unacknowledged bytes outstanding at this instant").
21:10

sommerfeld

oops, gotta run.
21:11

sommerfeld

(that may be you have 3888 bytes available and x00000 in flight. not necessarily cause for alarm)
21:13

jbk

yeah, i see swnd jumping around
21:13

jbk

and some rtt values around 5ms (so varying between 0.4-5ms)
21:17

jbk

hrm.. retranssegs keeps jumping by 3-4 segments every few seconds... seems like that is maybe not good
21:21

jbk

sommerfeld: so far, all we support i sunreno,newreno, and cubic for cc algs
21:21

jbk

though i know there's some other ones out there (I can't recall offhand) that look interesting
21:22

jbk

IIRC, fbsd (at least at one point) also went as far as to (as they described it) have a whole second tcp stack for some newer congestion bits (though it's unclear if that's anything inherent w/ the alg or more it was easier to do that on fbsd vs. fit it in with their existing stack)
21:37

sommerfeld

do you have sack enabled?
21:37

jbk

yes
22:20

jclulow

I think cubic is basically always at least as good as sunreno
22:20

jclulow

At least I've never seen it go any slower
22:20

jclulow

I feel like it could/should probably be the default these days
22:21

jclulow

There's also illumos.org/issues/15019 which I feel someone could actually do (not even that much) work on and get those defaults lifted a bit as well
22:21

fenix

→ BUG 15019: TCP and UDP default send_buf and recv_buf values should be increased (New)
22:22

jclulow

Also there's a here-say suggestion in there that dynamic tuning of the watermarks doesn't result in the same win as some change to the constants in the source -- but sadly there was no concrete demonstration of that behaviour, or suggestions of why
22:22

jclulow

Alas!
22:53

sommerfeld

yeah, cubic is probably the best of what's available in illumos today
23:09

sommerfeld

jbk: increase in RTT under load typically indicates a standing queue building up at the bottleneck link. the key insight in BBR is to look at how the sending rate affects RTT, with a goal to slow down sends when the RTT starts ramping up to minimize the size of that queue.
23:14

sommerfeld

jbk: if you're seeing retransmissions that might be worth digging into why. getting a tcpdump around a retransmission event (especially if you can get it on both ends) might be informative. are packets getting lost? is there packet reordering leading to spurious retransmits?
23:16

sommerfeld

there could be mundane reasons for packet reordering on the send side depending on how the tx queue is picked on a multi-queue NIC.. if you hash on cpu to pick the tx queue (which is good for scalability as it reduces cache misses) and the scheduler moves your sending threads to a different cpu all the time..
23:17

jbk

the sending process is written in go, so i wonder if that might also be a factor since it's runtime basically does its own scheduling of goroutines IIUC
23:18

sommerfeld

yeah, that could cause the cpu doing the sends on the connection to bounce around a bit.
23:21

sommerfeld

depending on how much cpu the sender actually needs, dinking with pbind or psrset might be informative.
23:26

jbk

i'm not sure offhand how MAC is doing that.. it's i40e on both sides, but over a vnic (long story), so i'm not sure how that'd impact how mac picks which ring to use

2 years ago

« a day earlier

a day later »

today »