-
spuos
does anyone know where I'd look for illumos-specific nfs tuning?
-
danmcd
1.) Make sure you have a dedicated log device on your NFS server's ZFS pool.
-
danmcd
2.) After that, identify other bottlenecks?
-
danmcd
3.) Make sure you ask one of the Big Questions: What problem are you really trying to solve?
-
danmcd
NFS is kneecapped by latency, generally speaking.
-
spuos
danmcd: I'm trying to solve the issue that is "my NFS is slow for remote hosts on a VPN, but not the one local one", I don't understand 3, what do you mean?
-
spuos
as for 1), guess who decided to buy one of those fancy power-backed ram disks :)
-
spuos
but, how would I find those other bottlenecks?
-
nomad
check for jumboframes locally and on the VPN. Also, if it's a VPN who fast is the connection?
-
spuos
nomad: should I be using jumboframes?
-
nomad
that's a difficult question thesedays. I was suggesting that if you have them locally but not on the VPN that could be the problem.
-
nomad
however, if you don't have them set anywhere then that isn't it.
-
nomad
still, the more important question is if your VPN can even support the speed you are looking for.
-
spuos
is that any different than the MTU for the VPN?
-
nomad
if it's over the Internet then no amount of tuning will make up for it.
-
nomad
jumboframes is an MTU setting, yes.
-
nomad
but please stop focusing on that if you're not using it locally. It's not going to make a difference.
-
spuos
yeah, it's internet, MTU is 1420
-
spuos
nomad: not focusing on jumboframes, you mean? also it's 1500 for my lan.
-
nomad
the jumboframs isn't the issue.
-
nomad
you still haven't answered the question about the VPN speed.
-
spuos
nomad: that's because I'm trying to find that out atm
-
danmcd
VPN introduces latency. Look at the ping times between your two NFS paths.
-
danmcd
shell% ping -svn nfs-dest-over-vpn 1024 10
-
danmcd
then
-
danmcd
shell% ping -svn nfs-dest-for-local-host 1024 10
-
danmcd
(I'm assuming your NFS clients are illumos. If they aren't, use `ping -c 10 -s 1024 nfs-dest`.
-
danmcd
)
-
danmcd
Latency is what's gonna kill you on NFS. Every... damned... time...
-
spuos
danmcd: sub-1 to ~135 ms. So assuming I still wanted to use nfs, are there still things I can do?
-
danmcd
Maybe increase the TCP (you're using TCP right?) buffer sizes higher?!?
-
andyf
You might want to check if you're using the TCP cubic congestion algorithm. I've seen problems with that (we should pull in the latest changes from freebsd there).
-
danmcd
That's tunable per-netstack (usually that's synonymous with per-zone). Use ipadm(8) to adjust max_buf, then send_buf and recv_buf.
-
andyf
ipadm show-prop tcp
-
spuos
I assumed latency would make it slower, but I didn't think it would be responsible for basically locking stuff to basically not functional when a larger file is attempted then failing
-
danmcd
oooh, I haven't changed anything from Sunreno, and cubic was on my list. Yes @andyf we should accept updates there.
-
danmcd
Oh wait... see THIS IS WHY I ASKED WHAT PROBLEM ARE YOU REALLY TRYING TO SOLVE?
-
spuos
yeah I'm using tcp too, but what's that about cubic congestion?
-
danmcd
"larger file is attempted then failing" could be symptomatic of OTHER things as well.
-
spuos
dancmd: I said I didn't get what you meant lol
-
nomad
as a general rule, routed NFS is not going to give you the best results. The more hops the worse results. If you really need to push the packets to a remote location then you aren't going to see the same speed/throughput as local.
-
danmcd
Simple smoke test: Can you scp or curl a large file from $NFS_SERVER to $VPN_NFS_CLIENT ?
-
spuos
yeah, all the time
-
danmcd
Okay. So it does fall back to NFS-specific, and yeah, nomad is right.
-
nomad
I'd still verify your MTU discovery is correct, though if you can scp large files then that's *probably* not the problem.
-
nomad
check the MTU between all involved network devices, just to be sure.
-
nomad
and with that, I'm out. Have fun and good luck.
-
spuos
well, that does bring me to my original hope, which is tuning to the best I can for that latency.
-
spuos
thanks nomad, will do.
-
andyf
For the cubic algorithm, see what you're using in the output of `ipadm show-prop tcp`, and if you're using cubic see if things improve with `sunreno`
-
andyf
I think the omnios default is cubic these days. Not sure about other distributions.
-
spuos
I'm on sunreno
-
danmcd
SmartOS is still sunreno;
-
spuos
SunOS vega 5.11 omnios-r151054-f66c95f374
-
danmcd
Plus SmartOS native NGZs have whatever they used in the past that persists IIRC. Been a while since I've mucked with something like that.
-
spuos
seems omnios is still sunren
-
andyf
Oh, we may only have thought about changing the default then
-
spuos
so, back to the main NFS question, does anyone know where I'd find resources for NFS tuning? I'd still like to see if it's possible to get better results for high latency.
-
gitomat
[illumos-gate] 17784 bhyve umouse_request() makes assumptions it maybe shouldn't -- Dan McDonald <danmcd⊙ei>
-
jclulow
spuos: I think some of the things you can do are related to mount options on the client; e.g., see wsize/rsize in mount_nfs(8). There are options that are less safe, but which may improve performance where the link latency is high, like "nocto".
-
jclulow
The client application actually ends up being pretty important for tuning purposes, too. For example, tar(1) is a pretty pathological workload for NFS: serially opening a file, doing some I/O, closing, moving on. Programs with multiple threads, where many I/Os are issued in parallel, are likely to do better, just for regular bandwidth/delay reasons.
-
jclulow
It's going to be hard to paper over 150msec of latency in general. It shouldn't be _unreliable_ though, just slow.
-
jclulow
(assuming there isn't a huge amount of packet loss as well, etc)
-
richlowe
ancient memories of AFS over real distances and normal connections
-
richlowe
it's going to work, but who knows when