#illumos

00:42

gwr_

code.illumos.org/c/illumos-gate/+/4795 17694 test runner silently ignores missing tests in runfile
00:42

fenix

→ CODE REVIEW 4795: 17694 test runner silently ignores missing tests in runfile (NEW) | illumos.org/issues/17694
17:03

jbk

sommerfeld: in your investigations around tcp congestion -- did you notice if the amount of 'control packets' (not really sure what the right name is -- they're just IP+TCP headers with no data)
17:03

jbk

decreased
17:03

jbk

?
17:04

jbk

i've noticed in some packet captures over time that we seem to send a fair amount of such packets (but investigating other problems, never had a chance to dig deeper)
17:05

jbk

to the point I was contemplating for some of the NICs we use adding a separate cache of pre-DMA mapped buffers that are smaller (e.g. 512 bytes) to use when we're handed tiny packets (< 512 bytes)
17:06

jbk

(since the naive apprach most NICs take can result in multiple GB of kernel memory being dedicated just for a NIC)
17:24

danmcd

jbk -> small caches for ACKs makes some modicum of sense at first glance, but if anyone knows the potential pitfalls it'll be Bill.
17:27

jbk

i know also in our case, things like SMB traffic can also result in small responses from the protocol as well..
17:28

jbk

since RAM is so expensive, having a driver allocate so much kernel memory that it could typically buffer multiple seconds of data (at line rate) even for 200Gb NICs gets harder to justify
17:30

jbk

(I'd love of have something like kmem_cache, but with a cap that could be used)
17:49

danmcd

You can probably, in your constructor/destructor for a kmem_cache, or as a wrapper to kmem*_alloc(), check if a request exceeds your cap, then either block-and-wait (no KM_NOSLEEP), spin-and-check (KM_NOSLEEP but not _LAZY or existence of KM_NORMALPRI), or return failure (KM_NOSLEEP_LAZY).
17:50

danmcd

Since you own the driver (or mac if you make it generic) you can restrict to KM_NOSLEEP_LAZY only. :)
19:49

richlowe

this
19:49

richlowe

... seems like it's going to make memory availability even harder to predict
19:50

richlowe

is there really no better source of backpressure than failing the allocator?
19:53

richlowe

as well, I feel like when OpenBSD introduced the ability to pf to prioritize ACK and other empty packets, they saw big perceived perf. improvements. Would the opposite happen if we ran out of pre-allocated space?
20:19

jbk

well that's why it'd be nice to have a cap
20:20

jbk

right now most NIC drivers tend to do things in a simple, but naive way, which usually results in excessive memory use
20:20

jbk

and in some cases, _extreme_ memory use (60+GB of kernel ram)
20:22

jbk

and if you do the math, the driver basically can (given the link speed) multiple seconds worth of packets at line rate, which should rarely be necessary (if ever)
20:31

Dixie_F

Wouldn't. it be cheaper to actually just buffer a part of a second? Nothing urgent should need more than a second to complete and if you are going to allocate the memory, why not get the benefit for the fast moving stuff you care about (FWIW I am not a software guy but may have designed chips in your phone)
20:33

Dixie_F

As in caching blindly seems better than letting stuff that doesn't make it out of the buffer quickly eat memory
20:35

Dixie_F

Ah, or is the idea that you ack and thus if your buffer overflows, you have lost the handshake aspect...
20:38

jbk

though on the TX side, it's technically not a problem -- generally for larger packets we DMA bind the existing memory and just copy the smaller ones, but even if we ran out of smaller ones, we could still bind them (just at a cost of some extra overhead for that packet)
20:48

MelanieUrsidino

I'm being endofuncted
20:48

MelanieUrsidino

help
20:48

MelanieUrsidino

(this is a joke)
22:29

sommerfeld

jbk: I generally see those described as ack-only packets.
22:31

sommerfeld

one way to handle them on RX is to copy them into a freshly allocated small mblk (they're small, after all..), freeing the large mapped mblk back where the driver can re-post it for receive.
22:36

rzezeski

which is what every driver should already be doing via it's copy threshold
22:51

jbk

i mean on tx
22:51

jbk

it's almost always copied into an MTU-sized buffer
22:52

jbk

vs. having a separate cache of smaller buffers that could leave the bigger ones for bigger packets
22:53

jbk

we usually allocate as many MTU-sized buffers as # tx rings x ring size just in case we have a packet split across a bunch of tiny mblk_ts
22:53

jbk

so we can copy all of those tiny mblk-ts into one larger buffer
22:54

jbk

since the larger mblk_ts will usually just be bound/mapped directly
22:54

jbk

(as long as the # of cookies doesn't exceed any NIC limits)
23:24

gitomat

[illumos-gate] 17331 convert mdb(1) to mdoc -- Andy Fiddaman <illumos⊙fn>

19 days ago

« a day earlier

a day later »

today »