10:38:56 <{3168}> Hello 12:42:45 [illumos-gate] 15846 Restructure CPU microcode update -- Andy Fiddaman 18:58:13 [illumos-gate] 15866 nfssrv: suggest parentheses around assignment used as truth value -- Toomas Soome 19:56:16 Seems the nfs changes have for now at least broken plexmediaserver running from an nfs share 19:56:18 ERROR - [Req#9c0] Waited over 10 seconds for a busy database; giving up. 19:56:41 Get these all the time now but now if I revert to omnios-bloody-20230817 19:56:49 only big diff seems to be the nfs stuff 19:56:59 Can I just mount it with nfsver=4.0 to get the old behavior? 20:02:29 linux seems to autodetect vers=4.2, but I think the last batch of nfs stuff was not 4.2 yet but 4.1 ? 20:06:53 4.1, yes 20:06:55 sjorge: Can you file an issue 20:07:02 It's important that we back it out promptly if it is broken 20:07:11 I'm talkign with papertigers 20:07:14 ok thanks 20:07:17 Hi's mount -t nfs4 shows 20:07:28 vers=4.0 20:07:37 where mine does vers-4.2 20:07:41 He's on omnios stable 20:08:05 Pre changes mine also shows vers=4.0 20:11:41 is it possible to get saved packet stream from time of the issue? 20:12:53 Not sure I can isolate it nicely 20:13:13 I can instantly trigger it by playing a movie and plex updating the watch status in the db 20:13:24 But that dump would be a few gig after a few minutes 20:14:19 see snoop -O option 20:15:32 you can limit the size of the output, just need to terminate snoop quick enough 20:16:02 https://www.illumos.org/issues/15869 20:16:03 → BUG 15869: Issue with linux mounting nfs after 15405 (New) 20:16:11 It's pretty late, so back on the older BE 20:16:17 I'll see if I can get a dump of sorts tomorrow 20:17:35 I'll try once more on the new be with nfsvers=4.0 in the mount options, to see if it's something triggerd by the linux nfs client only when in 4.2 mode 20:20:11 Interesting start, httpd (also pulling in stuff via nfs like DocRoot and config dir) now starts again at systemboot with nfsvers=4.0 before it would not start but start fine after, I assume that was on related but it might not have been 20:20:40 But rather tired, gonna try and finish watching the movie and if it crashes go to bed, if it doesn't It's an interesting datapoint for tomorrow when I am less sleepy 20:28:48 Oh my... yeah, we seem to be advertising we support 4.2: 20:28:51 build.work.kebe.com:/export/home on /net/build.work.kebe.com/export/home type nfs4 (rw,nosuid,nodev,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.24.4.180,local_lock=none,addr=172.24.4.184) 20:29:23 Hasn't crashed yet, but I'll bet if I do something that triggers something NFS4.2 specific whammo! 20:30:39 well plex is using a modified sqlite db 20:30:57 so not iseal 20:31:01 *ideal 20:31:19 but for single host acces it was working fine for years 20:31:39 sjorge: I might have a patchy/hacky fix. Examine $UTS/common/fs/nfs/nfs4_dispatch.c 20:31:47 #define NFS4_MAX_MINOR_VERSION 2 20:32:10 forced 4.0 on fstab and so far no plex hang yet 20:32:16 And further down we have rfs4_minorvers_mismatch() which says <= the above is ok. 20:32:26 I'd be VERY interested to see if you can force it to 4.1 and if it still works. 20:32:41 yep. 20:33:00 Anyway, I'm thinking we should change that #define to '1'. 20:33:15 if i can finish the movie, hung 4x in the first 20 min... with 4.0, i'll try 4.1 tomorrow 20:33:35 sjorge: Damn, even with 4.0 you get badness. Not cool . 20:33:56 no no, it hung 4x without it 20:34:06 so far 8 min in with vers=4.0 20:34:09 Oh gotcha. 20:34:14 Enjoy your film. 20:34:19 Found more problems... 20:34:27 nfs4_srv_attr.c: cs.minorversion = 2; 20:34:38 glad to be the resident canary 😅 20:34:52 will report back after the movie, or if it hangs before 20:35:16 rfs4_opnum_in_range() probably should probably not allow cs->minorversion == 2 at all. 20:39:00 rfs4_attr_init() ==> sets this: 20:39:06 cs.minorversion = 2; 20:40:29 So to summarize: 20:40:38 1.) NFS4_MAX_MINOR_VERSION should be set to 1. 20:40:57 2.) rfs4_minorvers_mismatch() should error out above 4.1 20:42:46 3.) rfs4_attr_init() should set cs.minorversion to 1. 20:43:08 4.) rfs4_opnum_in_range() should likely disallow minorversion of 2. 20:43:26 I may be missing more, but those are the low-hanging fruit. Might build a SmartOS PI with it. 20:43:41 (And this might cause enough customer pain to force me to patch this...) 20:44:02 2 is done with 1, isnt it 20:44:35 ooh you're right. 20:44:46 I'm already on it;) 20:45:03 You rock tsoome. 20:45:43 I believe, there was also server info block somewhere, not sure it thats meaningful, but we should not suggest we can do 4.2 20:47:14 did racktop not run into this? or do they already have the followup work to get us to 4.2? 20:47:52 not heard about plex, there definitely are followup patches to improve things 20:48:04 but not yet to get to full 4.2 20:48:42 I can reproduce this with a boring Debian NFS client using autofs . 20:49:21 Well, I can reproduce it thinking it's 4.2. I haven't made my illumos NFS-in-a-zone server panic yet, because I just wanted to see if the mount thought it was 4.2.. 20:50:14 * danmcd uses multiple NFS servers in zones, the SmartOS ones now have the NFS4.1 code. The OmniOS one is on '046, which won't get 4.1 because it's a stable. 20:50:44 server doesn't panic, it's plex that just hangs after logging that 20:51:35 Oh! 20:51:47 Well that's good. makes it less of a oh-shit-patch-it-now from my POV. 20:52:17 Okay, I'm disappearing. Thanks folks, and tsoome per unicast I can take any patches for quick testing vs. my Debian NFS client. 20:52:33 still OK so far with the vers overide in fstab 20:52:48 getting hopeful this is a workaround for now 20:53:31 (last thing I promise) Next time it's convenient for you see if you can fstab it to 4.1 and see how it does. 20:53:33 might just be linux nfs client, ours might be fine, i only have linux clients 20:53:49 our client only does 4.0 20:54:14 danmcd: that's the plan after the movie is done 20:55:18 tsoome: that's good, they prob wont try to do whatever linux tries that fails 21:02:58 Since I'm grabbing a snack, let me switch to vers=4.1 now, all seems good with vers=4.0 21:33:49 about 20 min on vers=4.1 without issue 21:38:30 movies done so I M 21:38:37 going to bed 21:59:09 I spoke to soon, looks like vers=4.1 also has the plex issue 22:13:12 hm 22:14:10 ok, so we still would need sample of packets exchanged, so we can see what is going on there... 22:32:15 i'm going to leave it at 4.0 for the weekend and see if everything works as usual 22:32:37 if it does i'll bump it to 4.1 and try and get a packet capture 22:46:27 ok, that would be really helpful. 22:46:52 https://code.illumos.org/c/illumos-gate/+/3023 is up as well. 22:46:53 → CODE REVIEW 3023: 15870 nfssrv: we should not accept nfs version 4.2 requests yet (NEW) | https://www.illumos.org/issues/15870 23:28:49 Building on SmartOS now. Can test if the automount that currently shows 4.2 will, after this fix is in place, properly show 4.1.