-
gitomat
[illumos-gate] 13095 ZFS ARC minimum size is too large -- Joshua M. Clulow <josh⊙so>
-
danmcd
@jclulow --> While merging your arc changes, I noticed in illumos-joyent that:
-
danmcd
1.) zfs_arc_overflow_shift is lower in illumos-joyent (3 instead of 8, thanks to seemingly workaround OS-6385).
-
danmcd
2.) A similar low-end clamp on large-memory machines was 1GB instead of 2GB.
-
danmcd
So as part of merging, eliminating #2 is easy (since yours/upstream's says 2GB instead), and I'm thinking resetting zfs_arc_overflow_shift back to 8 to match upstream seems reasonable
-
danmcd
because the change to 3 was a one-line workaround for what became fenix illumos#9284
-
fenix
BUG 9284: arc_reclaim_thread has 2 jobs (Closed)
-
fenix
-
danmcd
Dammit, lemme rephrase. Setting it to 3 was a placeholder UNTIL 9284 (aka OS-6363) and also fenix illumos#9018
-
fenix
OS-6363: system went to dark side of moon for ~467 seconds (Resolved)
-
fenix
-
danmcd
(aka OS-6404)
-
danmcd
I'm mentioning this in case the ghosts of SmartOS past are suggesting "Oh shit, maybe either 1GB min-ceiling or zfs_arc_overflow_shift set to 3 should actually be upstreamed!!!" I think those ghosts have been exorcised (and I'll be making merge choice there appropriately), but hearing they have from someone else would be nice.
-
jbk
speaking of systems going to the dark side of the moon... once it gets some more soak time, i can throw up the i40e patches that should largely prevent the 'stuck creating vnic for minutes' issue
-
jbk
the only catch is to avoid having to do an extensive rewrite of the tx logic (which the hw makes complicated enough), it does have to bump the minimum DMA threshold up to 512 bytes
-
jbk
at least with our stuff, we've found setting it to 512 yields better performance anyway, so at least to us wasn't a big deal
-
danmcd
You mean for the whole system DMA threshold?
-
jbk
for the nic
-
jbk
when it decides to DMA bind vs. copy into pre-allocated buf
-
danmcd
Oh yeah.
-
sommerfeld
the usual high fixed cost for DMA setup vs variable cost for copy tradeoff that's been with us forever.
-
andyf
Oh, this is too funny..
illumos/illumos-gate #89
-
sommerfeld
it needs a polite but firm rejection.
-
jbk
looking at the account, done that to a few other repos
-
jclulow
danmcd: Yeah I suspect small delta in thresholds should go away, and substantive improvements to mechanisms should come upstream
-
jclulow
The more we're running the same bits everywhere the better I reckon
-
andyf
I am planning to resurrect
code.illumos.org/c/illumos-gate/+/1272 in some form too.
-
fenix
→ CODE REVIEW 1272: vm_pageout improvements from SmartOS/OmniOS - WIP (NEW)
-
danmcd
SO thank you @jclulow --> this reduces our arc.c diffs to the zone ZFS I/O priority, which I'm totally cool about.
-
jclulow
FWIW, I would also totally support arc_c_min cap coming down to 1GB
-
danmcd
Oh?
-
danmcd
I can keep it 1GB (by modifying your new upstream code) in SmartOS for the time being.
-
jclulow
I pulled 2GB from my hat, really. It could be smaller. Mostly making it not smaller was because I am chicken
-
danmcd
But SmartOS had it at 1GB for a long time. Tell you what, I'm keeping it 1G and filing a bug.
-
jclulow
Yeah I think that's better
-
jclulow
As with most of these values that scaled linearly with some physical property of the machine (RAM size, CPU thread count, etc) I think we were only ever looking at a very narrow region of the chart of appropriate values
-
jclulow
And it would be appropriate to put relatively low absolute caps on many of them that grow to be quite large
-
jclulow
The ARC definitely won't shrink unless you push it to shrink, and if you're pushing it to shrink it really probably needed to happen based on what you're putting on the machine
-
alanc
looks like a bunch of ignored entries under
github.com/illumos/illumos-gate/pulls beyond the AGPL troll
-
» nomad first read that as "a bunch of ignorant entries"
-
alanc
not entirely wrong, if viewed as ignorance of the actual process to integrate changes to illumos-gate
-
ptribble
I read the latest username as "eejit", which seemed appropriate
-
neirac
jclulow I remember running zfs in al old pentium 3 (nevada b35) with 512 mb so maybe arc on that time could be smaller?
-
neirac
andyf just added a new column to the ::vmm dcmd, just the zone_t to now which vm is running in which zone, just to save typing
code.illumos.org/c/illumos-gate/+/3095 and the indestructible flag
-
fenix
→ CODE REVIEW 3095: 14525 Would like kmdb module for vmm (NEW) |
illumos.org/issues/14525
-
danmcd
@jclulow fenix illumos#15995
-
fenix
BUG 15995: 13095's 2GB arc_c_min ceiling should be 1GB (New)
-
fenix
-
gitomat
[illumos-gate] 15931 Add ::tdelta dcmd to mdb -- Jason King <jking⊙rc>
-
jclulow
alanc: Do you recall if KM_PUSHPAGE and the pageout_reserve threshold were added specifically for ZFS, or if there were also issues with swap to UFS files and NFS that precipitated it? I don't see KM_PUSHPAGE in a Solaris 9 manual PDF I found online.
-
alanc
PSARC 2002/761 KM_PUSHPAGE makes no mention of ZFS (which was still in early development then) nor any other filesystems
-
alanc
looks like it was first integrated to early S10 though
-
alanc
the PSARC makes it sound like pageout_reserve dates back to 2.6 though
-
richlowe
I think this whole web of stuff is a mess, honestly
-
richlowe
but I'm not sure what we could do about it
-
richlowe
I think `pageout_reserve` might be in the Solaris Internals books
-
richlowe
if that helps at all
-
alanc
yeah, pageout_reserve is much older
-
alanc
bonwick, 96/07/28:
-
alanc
We introduce a new variable, pageout_reserve, which by default
-
alanc
is 1/2 of throttlefree. In page_create_throttle(), we allow all
-
alanc
allocations by pageout or sched to succeed (as we do today).
-
alanc
However, for non-blocking allocations, we fail the allocation
-
alanc
for any other threads if freemem < pageout_reserve.
-
alanc
from Solaris 2.6
-
alanc
that one mentions "third party drivers or if you page over NFS"
-
alanc
-
fenix
→ OpenSolaris issue 1261108: page_create_throttle() should keep a few pages solely for pageout (Fix Delivered)
-
jclulow
Yeah thank you that's very helpful
-
alanc
and that references a bug from an entirely different universe:
illumos.org/opensolaris/bugdb/bug.html#!1168966 in which paging parameters are tuned to support "about 300 telnet users"
-
fenix
→ OpenSolaris issue 1168966: paging thresholds are too low on very big systems causing kmem alloc failures (Fix Delivered)
-
jclulow
alanc: this is delightful