-
tozhu
hello all, does illumos / smartos support SR-IOV ? can RDMA run with bhyve vm?
-
tozhu
thanks
-
sjorge
i'm 90% sure we do not do SR-IOV
-
tozhu
sjorge: if so, can’t use RDMA with the Bhyve VM?
-
sjorge
I doubt that would work
-
tozhu
where had related docs? I have not find out how to use RDMA with bhyve, such as NoF (NVMe over RDMA/RoCEv2)
-
tozhu
any advice, best wishes
-
jbk
i think your best best would be doing pci pass through of the device if possible..
-
copec
Most of the SR-IOV configuration to create the virtual pci functions is in the driver for the respective devices when they are setup on boot
-
copec
-
rzezeski
For our NIC drivers we tend to prefer to use the device's packet processing and classification features to steer traffic into dedicated rings which can themselves be mapped to individual VNICs. For example, Intel's VMDq support. I don't believe we've setup any SR-IOV support in any of our drivers, at least not the ones I'm familiar with.
-
danmcd
rzezeski is correct. And our mlxcx(4D), which drives CX-4, does not.
-
copec
danmcd - to be clear to myself - mxlxcx uses the packet processing and classification features, but does not setup SR-IOV?
-
danmcd
mlxcx does exactly what rzezeski says, so yes no SR-IOV.
-
gitomat
[illumos-gate] 16228 convert git-pbchk.1onbld to mdoc -- Bill Sommerfeld <sommerfeld⊙ho>
-
danmcd
Looping back on a conversation from a couple of weeks ago, I have taken a 240GB NVMe drive, used format(8) on it to slice it up into three roughly-equal-sized parttions, and now have slice0 of it as my zpool's slog.
-
danmcd
Given this device doesn't SEEM to have any namespaces beyond the one, it seemed the best thing to do.
-
copec
I've had to slice ssds much in the past for slog/l2arc purposes (working primarily for small businesses we are usually dealing with limited physical disk space), and even though it isn't as clean, I haven't had a problem with it.
-
nomad
I've done it too, despite the heckling from the openzfs irc channel.
-
jbk
danmcd: speaking of mlxcx, have you had any luck w the 100gb cards and using a breakout cable into 4x25gb SFPs?
-
jbk
or is that something that's even been attempted?
-
rmustacc
Treating it as 4 distinct interfaces?
-
jbk
no.. just one of them
-
rmustacc
Is the far end joining them together?
-
danmcd
@jbk --> forgot to update on that.
-
danmcd
Turns out I had a test-tool problem. Namely iperf3 is single-threaded, something unlike netperf isn't made plainly obvious.
-
danmcd
Once I started concurrent iperf3s on different ports I got better results esp. on the EPYC.
-
danmcd
It needs work (tops out at 50-60Gbits w/o any LSO or some of arekinath's other goodies) but it's not horrid as I'd feared.
-
danmcd
I'll be very interested to see when they get returned to their low-powered original home.
-
jbk
i think the idea is to plug in a 100gb NIC into a 25gb switch port.. I'm far, far away from the physical hardware, but I'm guessing there's some sort of form factor issue where you can't just plug it in and have it auto negotiate down
-
jbk
and need different sfps
-
sommerfeld
that sounds vaguely familiar. jbk: probably sfp28 (25gbit/s) vs qsfp28 (4x25gbit/s)
-
sommerfeld
(there is also apparently OSFP out there -- 8x whatever)
-
jbk
yeah.. IIUC, the card itself only accepts qsfp28
-
rzezeski
danmcd: honestly, that's pretty decent compared to what I would have expected you to report. so that's good
-
danmcd
rzezeski: that's with our new-DC-build EPYC server. The test machines I am now back to are literally Atom (Skylake era). I'm literally about to see how they do again, now that I understand WTaF was going on.
-
danmcd
Those numbers are also with MTU=9k and max_buf cranked to 16M (prob. overkill on that one).
-
rzezeski
If it's for intra-DC comms I think large MTU is totally fair, we can have nice things.
-
danmcd
Ahhh I'm seeing the CPU limitations (of course, one wonders how much of that is code we have that could do better with less).
-
jbk
danmcd: i take it the system must have a lot of ram :)
-
danmcd
The EPYC, oh yes.
-
jbk
i think i mentioned, we had a system where the integrator messed up and shipped it w/ 64gb of ram
-
jbk
turns out that wasn't enough for 2 4-way 25gb aggrs + vnics w/ mlxcx :)
-
rmustacc
There are tunables you can adjust on all that you know, right?
-
danmcd
The driver memory, yes.
-
jbk
i was more impressed than anything
-
jbk
(and once we fixed the integrators goof up, it was fine)
-
neirac
I just tested oci out of boredom again, I still need to set pit_is_broken=1 to be able to boot and virtio_scsi still does not work, disks are not available
-
neirac
this is what I have for prtconf -vp
-
neirac
-
neirac
the odd thing that smartos booted without modifying pit