-
{3168}
Hello
-
gitomat
[illumos-gate] 15846 Restructure CPU microcode update -- Andy Fiddaman <illumos⊙fn>
-
gitomat
[illumos-gate] 15866 nfssrv: suggest parentheses around assignment used as truth value -- Toomas Soome <tsoome⊙mc>
-
sjorge
Seems the nfs changes have for now at least broken plexmediaserver running from an nfs share
-
sjorge
ERROR - [Req#9c0] Waited over 10 seconds for a busy database; giving up.
-
sjorge
Get these all the time now but now if I revert to omnios-bloody-20230817
-
sjorge
only big diff seems to be the nfs stuff
-
sjorge
Can I just mount it with nfsver=4.0 to get the old behavior?
-
sjorge
linux seems to autodetect vers=4.2, but I think the last batch of nfs stuff was not 4.2 yet but 4.1 ?
-
tsoome
4.1, yes
-
jclulow
sjorge: Can you file an issue
-
jclulow
It's important that we back it out promptly if it is broken
-
sjorge
I'm talkign with papertigers
-
jclulow
ok thanks
-
sjorge
Hi's mount -t nfs4 shows
-
sjorge
vers=4.0
-
sjorge
where mine does vers-4.2
-
sjorge
He's on omnios stable
-
sjorge
Pre changes mine also shows vers=4.0
-
tsoome
is it possible to get saved packet stream from time of the issue?
-
sjorge
Not sure I can isolate it nicely
-
sjorge
I can instantly trigger it by playing a movie and plex updating the watch status in the db
-
sjorge
But that dump would be a few gig after a few minutes
-
tsoome
see snoop -O option
-
tsoome
you can limit the size of the output, just need to terminate snoop quick enough
-
sjorge
-
fenix
→
BUG 15869: Issue with linux mounting nfs after 15405 (New)
-
sjorge
It's pretty late, so back on the older BE
-
sjorge
I'll see if I can get a dump of sorts tomorrow
-
sjorge
I'll try once more on the new be with nfsvers=4.0 in the mount options, to see if it's something triggerd by the linux nfs client only when in 4.2 mode
-
sjorge
Interesting start, httpd (also pulling in stuff via nfs like DocRoot and config dir) now starts again at systemboot with nfsvers=4.0 before it would not start but start fine after, I assume that was on related but it might not have been
-
sjorge
But rather tired, gonna try and finish watching the movie and if it crashes go to bed, if it doesn't It's an interesting datapoint for tomorrow when I am less sleepy
-
danmcd
Oh my... yeah, we seem to be advertising we support 4.2:
-
danmcd
build.work.kebe.com:/export/home on /net/build.work.kebe.com/export/home type nfs4 (rw,nosuid,nodev,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.24.4.180,local_lock=none,addr=172.24.4.184)
-
danmcd
Hasn't crashed yet, but I'll bet if I do something that triggers something NFS4.2 specific whammo!
-
sjorge
well plex is using a modified sqlite db
-
sjorge
so not iseal
-
sjorge
*ideal
-
sjorge
but for single host acces it was working fine for years
-
danmcd
sjorge: I might have a patchy/hacky fix. Examine $UTS/common/fs/nfs/nfs4_dispatch.c
-
danmcd
#define NFS4_MAX_MINOR_VERSION 2
-
sjorge
forced 4.0 on fstab and so far no plex hang yet
-
danmcd
And further down we have rfs4_minorvers_mismatch() which says <= the above is ok.
-
danmcd
I'd be VERY interested to see if you can force it to 4.1 and if it still works.
-
tsoome
yep.
-
danmcd
Anyway, I'm thinking we should change that #define to '1'.
-
sjorge
if i can finish the movie, hung 4x in the first 20 min... with 4.0, i'll try 4.1 tomorrow
-
danmcd
sjorge: Damn, even with 4.0 you get badness. Not cool .
-
sjorge
no no, it hung 4x without it
-
sjorge
so far 8 min in with vers=4.0
-
danmcd
Oh gotcha.
-
danmcd
Enjoy your film.
-
danmcd
Found more problems...
-
danmcd
nfs4_srv_attr.c: cs.minorversion = 2;
-
sjorge
glad to be the resident canary 😅
-
sjorge
will report back after the movie, or if it hangs before
-
danmcd
rfs4_opnum_in_range() probably should probably not allow cs->minorversion == 2 at all.
-
danmcd
rfs4_attr_init() ==> sets this:
-
danmcd
cs.minorversion = 2;
-
danmcd
So to summarize:
-
danmcd
1.) NFS4_MAX_MINOR_VERSION should be set to 1.
-
danmcd
2.) rfs4_minorvers_mismatch() should error out above 4.1
-
danmcd
3.) rfs4_attr_init() should set cs.minorversion to 1.
-
danmcd
4.) rfs4_opnum_in_range() should likely disallow minorversion of 2.
-
danmcd
I may be missing more, but those are the low-hanging fruit. Might build a SmartOS PI with it.
-
danmcd
(And this might cause enough customer pain to force me to patch this...)
-
tsoome
2 is done with 1, isnt it
-
danmcd
ooh you're right.
-
tsoome
I'm already on it;)
-
danmcd
You rock tsoome.
-
tsoome
I believe, there was also server info block somewhere, not sure it thats meaningful, but we should not suggest we can do 4.2
-
sjorge
did racktop not run into this? or do they already have the followup work to get us to 4.2?
-
tsoome
not heard about plex, there definitely are followup patches to improve things
-
tsoome
but not yet to get to full 4.2
-
danmcd
I can reproduce this with a boring Debian NFS client using autofs .
-
danmcd
Well, I can reproduce it thinking it's 4.2. I haven't made my illumos NFS-in-a-zone server panic yet, because I just wanted to see if the mount thought it was 4.2..
-
» danmcd uses multiple NFS servers in zones, the SmartOS ones now have the NFS4.1 code. The OmniOS one is on '046, which won't get 4.1 because it's a stable.
-
sjorge
server doesn't panic, it's plex that just hangs after logging that
-
danmcd
Oh!
-
danmcd
Well that's good. makes it less of a oh-shit-patch-it-now from my POV.
-
danmcd
Okay, I'm disappearing. Thanks folks, and tsoome per unicast I can take any patches for quick testing vs. my Debian NFS client.
-
sjorge
still OK so far with the vers overide in fstab
-
sjorge
getting hopeful this is a workaround for now
-
danmcd
(last thing I promise) Next time it's convenient for you see if you can fstab it to 4.1 and see how it does.
-
sjorge
might just be linux nfs client, ours might be fine, i only have linux clients
-
tsoome
our client only does 4.0
-
sjorge
danmcd: that's the plan after the movie is done
-
sjorge
tsoome: that's good, they prob wont try to do whatever linux tries that fails
-
sjorge
Since I'm grabbing a snack, let me switch to vers=4.1 now, all seems good with vers=4.0
-
sjorge
about 20 min on vers=4.1 without issue
-
sjorge
movies done so I M
-
sjorge
going to bed
-
sjorge
I spoke to soon, looks like vers=4.1 also has the plex issue
-
tsoome
hm
-
tsoome
ok, so we still would need sample of packets exchanged, so we can see what is going on there...
-
sjorge
i'm going to leave it at 4.0 for the weekend and see if everything works as usual
-
sjorge
if it does i'll bump it to 4.1 and try and get a packet capture
-
tsoome
ok, that would be really helpful.
-
tsoome
-
fenix
→ CODE REVIEW 3023: 15870 nfssrv: we should not accept nfs version 4.2 requests yet (NEW) |
illumos.org/issues/15870
-
danmcd
Building on SmartOS now. Can test if the automount that currently shows 4.2 will, after this fix is in place, properly show 4.1.