16:52:20 <antranigv> Just saw this and wanted to share: https://mastodon.social/@bcantrill/111840269356297809
18:19:49 <ajk203> Hi, all. Does anyone know where in the source I can look for the NFS client? I'm specifically looking for where it sets up its TCP socket. I'm looking for how the client sets its TCP keepalives. Can anyone point me in the right direction? Many thanks.
18:28:41 <gitomat> [illumos-gate] 16035 ::msgbuf help missing whitespace -- Jason King <jason.brian.king⊙gc>
18:34:41 <rmustacc> A bunch of nfs is in uts/common/fs/nfs and then cmd/fs.d/nfs which has a bit for the libraries, and related. I've not looked at that off hand so I can't speak to where the bit you're looking for is.
18:44:31 <danmcd> For NFS service it's rather complex thanks to very old design decisions.  NFS server starts in kernel using <ick> TLI/XTI endpoints. $UTS/comnon/fs/nfs/nfs_server.c has this:
18:44:35 <danmcd>         /* Create a transport handle. */
18:44:35 <danmcd>         error = svc_tli_kcreate(fp, readsize, buf, &addrmask, &xprt,
18:44:35 <danmcd>             sctp, NULL, NFS_SVCPOOL_ID, TRUE);
18:44:55 <danmcd> which deep down should open a TLI endpoint and bind port 2049 to it.
19:12:54 <sommerfeld> danmcd: question was about the client side.   ajk203: look for the "clnt" equivalents of the "svc" things Dan mentioned.  start looking around usr/src/uts/common/rpc/clnt_*.c
19:13:45 <sommerfeld> connmgr_connect() in clnt_cots.c looks relevant..
19:25:12 <jbk> so who wants to update the nfs code to use ksocket? :)
19:25:22 <jbk> (as I take a step or two back)
19:26:26 <sommerfeld> "not poo" ?
19:26:42 <sommerfeld> :-)
19:27:13 <sommerfeld> (unfortunate subject line truncation, no doubt)
19:32:30 <sommerfeld> jbk: my understanding is that the issue for #16163 isn't the in-flight I/O's but rather a large worklist of block pointers read from metadata blocks.
19:34:02 <sommerfeld> (the sorted scrub idea is that you scan metadata blocks and accumulate a sorted list of block pointers, and then process them in LBA-ish order to sequentialize scrub I/O)
19:34:11 <sommerfeld> but I've got to run now..
20:01:38 <jbk> oh heh.. when I copied it, it left off the l
20:01:47 <jbk> as i see it highlighted in the window
20:02:43 <jbk> do you know from the dumps if the memory is that and not zio_ts?
20:03:11 <jbk> we've seen the scrub prefetcher basically goes full throttle as it goes through the metadata
20:03:45 <jbk> so it can generate these absolutely massive bursts of largely prefetch I/Os
20:05:17 <jbk> it didn't help that the specific model drives from the vendor (whom shall remain nameless) seemed to come down with random bouts of what I dubbed 'tortoise nervosa'
20:05:28 <jbk> but with no actual errors
20:05:34 <jbk> in any log pages or anything
20:05:39 <jbk> just get slow for a bit, then fine
20:06:26 <jbk> (they also had the fun side effect that right as it would get near finishing resilvering the pool, another disk would fail, triggering more resilvering)
20:07:58 <jbk> that would exaggerate the bias in how the zios were distributed
20:09:09 <jbk> there is an openzfs change that will switch the zio scheduler into LIFO mode if the queue depth gets too large (or too old), but that seemed more extensive
21:52:47 <ajk203> @danmcd. thanks. @sommerfeld thanks I'll take a look at the clnt code. many thanks.
22:06:07 <sommerfeld> jbk: sorry, was off running errands.   
22:08:24 <sommerfeld> see the "Grand Theory Statement" in dsl_scan.c; BP's found in metadata get recorded as a scan_io_t which is " the minimum information needed to reconstruct a
22:08:24 <sommerfeld>  * zio for sequential scanning."
22:09:52 <jbk> yeah, i was just wondering if those dumps where the actual space being used was -- in the issue we had, we'd routinely see > 100gb of zio_ts queued in the pool, and pretty much all of it was was prefetching
22:10:52 <jbk> we only saw it with one customer (but it was a $@#$@#$ to root cause for various reasons) and it seems like something that needs a large pool and lots of ram
22:12:14 <jbk> and probably helped by disk performance dropping for unexplained reasons for stretches at a time during the resilver
22:14:53 <jbk> i've mentioned it before, but we've seen the same conceptual problem with other bits in zfs (basically generating load w/o any backpressure or throttle and hoping the system can handle it)
22:14:57 <jbk> e.g. zfs diff
23:01:15 <sommerfeld> jbk: I didn't look into where the memory was going in those dumps.
23:04:45 <sommerfeld> taking a look now
23:24:17 <sommerfeld> ~900MB in the sio_cache_* caches which is where the sorted block pointers go (this is a 24G machine).