00:42:20 https://code.illumos.org/c/illumos-gate/+/4795 17694 test runner silently ignores missing tests in runfile 00:42:21 → CODE REVIEW 4795: 17694 test runner silently ignores missing tests in runfile (NEW) | https://www.illumos.org/issues/17694 17:03:35 sommerfeld: in your investigations around tcp congestion -- did you notice if the amount of 'control packets' (not really sure what the right name is -- they're just IP+TCP headers with no data) 17:03:41 decreased 17:03:42 ? 17:04:21 i've noticed in some packet captures over time that we seem to send a fair amount of such packets (but investigating other problems, never had a chance to dig deeper) 17:05:37 to the point I was contemplating for some of the NICs we use adding a separate cache of pre-DMA mapped buffers that are smaller (e.g. 512 bytes) to use when we're handed tiny packets (< 512 bytes) 17:06:20 (since the naive apprach most NICs take can result in multiple GB of kernel memory being dedicated just for a NIC) 17:24:26 jbk -> small caches for ACKs makes some modicum of sense at first glance, but if anyone knows the potential pitfalls it'll be Bill. 17:27:18 i know also in our case, things like SMB traffic can also result in small responses from the protocol as well.. 17:28:23 since RAM is so expensive, having a driver allocate so much kernel memory that it could typically buffer multiple seconds of data (at line rate) even for 200Gb NICs gets harder to justify 17:30:51 (I'd love of have something like kmem_cache, but with a cap that could be used) 17:49:29 You can probably, in your constructor/destructor for a kmem_cache, or as a wrapper to kmem*_alloc(), check if a request exceeds your cap, then either block-and-wait (no KM_NOSLEEP), spin-and-check (KM_NOSLEEP but not _LAZY or existence of KM_NORMALPRI), or return failure (KM_NOSLEEP_LAZY). 17:50:13 Since you own the driver (or mac if you make it generic) you can restrict to KM_NOSLEEP_LAZY only. :) 19:49:23 this 19:49:42 ... seems like it's going to make memory availability even harder to predict 19:50:10 is there really no better source of backpressure than failing the allocator? 19:53:27 as well, I feel like when OpenBSD introduced the ability to pf to prioritize ACK and other empty packets, they saw big perceived perf. improvements. Would the opposite happen if we ran out of pre-allocated space? 20:19:42 well that's why it'd be nice to have a cap 20:20:31 right now most NIC drivers tend to do things in a simple, but naive way, which usually results in excessive memory use 20:20:46 and in some cases, _extreme_ memory use (60+GB of kernel ram) 20:22:02 and if you do the math, the driver basically can (given the link speed) multiple seconds worth of packets at line rate, which should rarely be necessary (if ever) 20:31:14 Wouldn't. it be cheaper to actually just buffer a part of a second? Nothing urgent should need more than a second to complete and if you are going to allocate the memory, why not get the benefit for the fast moving stuff you care about (FWIW I am not a software guy but may have designed chips in your phone) 20:33:54 As in caching blindly seems better than letting stuff that doesn't make it out of the buffer quickly eat memory 20:35:53 Ah, or is the idea that you ack and thus if your buffer overflows, you have lost the handshake aspect... 20:38:10 though on the TX side, it's technically not a problem -- generally for larger packets we DMA bind the existing memory and just copy the smaller ones, but even if we ran out of smaller ones, we could still bind them (just at a cost of some extra overhead for that packet) 20:48:38 I'm being endofuncted 20:48:40 help 20:48:42 (this is a joke) 22:29:29 jbk: I generally see those described as ack-only packets. 22:31:11 one way to handle them on RX is to copy them into a freshly allocated small mblk (they're small, after all..), freeing the large mapped mblk back where the driver can re-post it for receive. 22:36:06 which is what every driver should already be doing via it's copy threshold 22:51:43 i mean on tx 22:51:56 it's almost always copied into an MTU-sized buffer 22:52:29 vs. having a separate cache of smaller buffers that could leave the bigger ones for bigger packets 22:53:28 we usually allocate as many MTU-sized buffers as # tx rings x ring size just in case we have a packet split across a bunch of tiny mblk_ts 22:53:42 so we can copy all of those tiny mblk-ts into one larger buffer 22:54:17 since the larger mblk_ts will usually just be bound/mapped directly 22:54:31 (as long as the # of cookies doesn't exceed any NIC limits) 23:24:55 [illumos-gate] 17331 convert mdb(1) to mdoc -- Andy Fiddaman