-
jbk
yuripv: so it doesn't have an entire network switch (complete with LLDP and dcbx) running on the nic? :)
-
gitomat
[illumos-gate] 16739 dladm: dangling pointer 'buf' to 'bbuf' may be used -- Toomas Soome <tsoome⊙mc>
-
gitomat
[illumos-gate] 16688 libipadm: storing the address of local variable -- Toomas Soome <tsoome⊙mc>
-
gitomat
[illumos-gate] 16728 libppt: not all devices have a subsystem id and subsystem vendor id -- Hans Rosenfeld <rosenfeld⊙gho>
-
rmustacc
jbk: I didn't have a real plan but probably would just fall back to something like single MSI/INTx means single rx/tx pair and group.
-
rmustacc
Just to keep life simple.
-
rmustacc
I think the HWRM interface has its limitations too. The grass isn't all greener on the other side.
-
yuripv
yep, but it looked just a bit more human-readable to me :)
-
yuripv
BTW, does it make sense to support INTx these days?
-
rmustacc
For a new PCIe card like this, likely not.
-
richlowe
I'm going to say "maybe"
-
rmustacc
pcplusmp as an interrupt distributor only gave us a very small number of interrupts.
-
rmustacc
We should definitely still continue to support INTx.
-
richlowe
'cos robert is thinking very x86-ily
-
rmustacc
Whether a driver uses it or not is very dependent on it.
-
rmustacc
But virtualization and other cases still are challenging.
-
rmustacc
Yeah, it's true I am. I guess if we really have run out because someone has loaded up the system and we only have one CPUs worth of IPL4 or IPL5 interrupts you're going to have a bad day.
-
rmustacc
In other cases like virtualization, INTx is also still important.
-
richlowe
everything is screwy, and you'll never escape
-
rmustacc
I see. FF6 doom train no escape.
-
rmustacc
If I were working on it I would probably treat non-MSI/X as basically one rx ring/one tx ring/one group.
-
rmustacc
yuripv: I think the Intel datasheets are a bit better than my memory of some of those headers.
-
rmustacc
But IIRC we also ahve the problem that you basically have to send the varios rx and tx rings to a completion queue in bnxt.
-
rmustacc
And that the interrupt can only be enabled or disabled on a per-MSI/X basis and not a per-CQ basis.
-
rmustacc
But it's been a long time.
-
jbk
i guess the thing with the intel nics is what OSes are actually using all of that complexity with the multiple forms of virtualization, the embedded network switch, all the network services (LLDP, DCBx, ...) that run on the NIC, etc. vs just as a plain old NIC w/ a bunch of tx and rx rings?
-
rmustacc
I mean we use some of them like the VSIs and related.
-
rmustacc
Put differently, bnxt/hwrm was too simple to do the more comlex rings/groups in the normal config we have in ixgbe and i40e.
-
rmustacc
Though not impossible.
-
rmustacc
But as for who uses those other things, telcos.
-
rmustacc
I'f you're using DCB, you need LLDP.
-
rmustacc
Not saying it's the way I would go if I was designing the NIC, but if we had access to programming the pipeline that'd be valuable.
-
jbk
I mean at least e810, it supports the OCP apis and you do get at least some of that programming IIRC (i've read through but at almost 3000 pages long, don't remember all of it :P)
-
rmustacc
It's certainly probably changed since I talked with the team in early 2020.
-
rmustacc
Look, we may complain about lenght and what's left out, but trust me that's much better than the no datasheet case.
-
yuripv
I have datasheet for bnxt, and it helped immediately as BARs you had in there from FreeBSD (I guess?) were incorrect, so yes, having datasheets really helps :D
-
jbk
i just mean, I think it does give you some of the access to the pipeline (which is new w/ it)... i just can't remember the details despite reading the specs (because of the size of it)
-
jbk
since it wasn't an area i'm focused on atm
-
jbk
and didn't look like it was needed to get things working, more of 'if you want to get fancy with things'
-
rmustacc
I guess put differently my currently ideal nic would have a lot more than the X520, as reliable as it is.
-
richlowe
I am so happy to exist in a world where I have never had to think about my "ideal NIC"
-
sommerfeld
jbk: hyperscalers are one market doing that sort of thing so they can (among other things) resell the whole intel CPU as "bare metal" but not give the customer raw network access.
-
jbk
yeah, but like are Linux and FreeBSD actually using all of those various virtualization features and taking advantage of the built-in switch? or are they just using it as a really fast nic w/ a bunch of rings?
-
jbk
or windows?
-
rmustacc
I mean, we use the built-in switch in i40e.
-
rmustacc
And a lot more people in those worlds that are buying aren't just using the defaults net setup.
-
rmustacc
Again, not defending the E810 choices.
-
jbk
we do? I mean aside from just the fact you have to touch it to make traffic flow? I don't think there are any apis in mac that'd let it basically hw offload vnic support or handle switching packets between instances on the same host (I thought mac will basically hairpin that before it can hit the driver)
-
rmustacc
I mean, we are using VSIs and VEBs to make our reality mesh.
-
rmustacc
It's not offloading, but we're using it.
-
sommerfeld
jbk: my impression is that a fair number of the weirdo features get used via userspace networking. Things like:
research.google/pubs/snap-a-microkernel-approach-to-host-networking
-
rmustacc
But not sure it's really worth the distinction. As we're just doing what we need.
-
rmustacc
Yeah, there's definitely a lot that get used that.
-
sommerfeld
jbk: holy grail for some is to get the host/hypervisor stack out of the datapath for guest traffic. You give each guest some number of specially configured tx/rx rings and their traffic gets encapsulated by the NIC directly onto the wire.
-
richlowe
that makes sense
-
jbk
yeah, i mean the nic allows you to do that which is neat, i was just wondering if there is any hypervisor that's actually making use of it though? e.g. is there any sort of kernel API on linux or windows or such that'd allow you to hook into all of that stuff?
-
richlowe
I mean, in the google case that isn't going to matter
-
sommerfeld
jbk: most of the detail work in sort of thing would likely be in privileged userspace code talking to device control queues with minimal kernel involvement.
-
rzezeski
I think the thing to keep in mind is this is all based on IEEE standards. And that it was created to solve issues that developed when you moved the VM network switching from the hypervisor to things like VMDq and SR-IOV, providing VFs to guests and taking the hypervisor out of play for data flow. And I think the standards are an attempt to make it easier to consistently deploy ACL/QoS rules across the virtual network.
-
rzezeski
That said I have no idea how widely used they are or who uses them. And certainly no one is using this stuff in illumos-land.
-
rzezeski
But at this point, if you want to program these new Intel NICs, this is the language you need to speak.
-
rzezeski
I will also say that, as much as I have cursed i40e in the past, now that I have experience writing the ENA driver and getting Chelsio T7 up: Intel's programming manuals are pretty great. They might be wrong in places, and you should tread carefully; but they do a good job of describing the various aspects of how to program their NICs.