-
Smithx10
copec: I explored fast nat a bit..... and its pretty hilarious how fast that rabbit hole goes sad face lol
-
copec
Yeah. I had to upgrade my old home server to do 1Gb and saturating the link pretty much eats all the CPU of it. So I'll have to probably get a real router whenever I upgrade from that
-
danmcd
You might wanna try the BHYVE appliance approach first. (Unless @Smithx10 you ARE talking about a bhyve-based NAT...)
-
danmcd
There's potential to build a better NAT for illumos. A very larval approach (that focusses on vxlan internal nets) is in the `nat-reform` branch of illumos-joyent.
-
danmcd
(tl;dr ==> Use conn_t infrastructure for NAT sessions, rewire conn_recv & conn_xmit, etc. etc.)
-
copec
I tried opnsense, and it is really nice, but it was only slightly ahead the illumos nat in cpu usage
-
copec
I wish I was an experienced c dev and could jump into something like that (or Rust, are we using any Rust in the kernel?)
-
» copec checks out nat-reform branch
-
Smithx10
danmcd: I was looking at if there was a eBPF firewall on linux that did NAT (there wasnt anything concrete i found) and then it goes into Hardware Offloaded Conn Tracking in Mellanox cards lol
-
Smithx10
I dont have that kind of need for NAT speed lol
-
danmcd
copec: --> yeah, pfsense or opensense was what I was hearing about.
-
danmcd
copec: the `vxlnat` sources is where you should look.
-
copec
-
jbk
speaking of eBPF...
-
jbk
-
copec
eBPF is really neat, but it is sort of crawling at something from the wrong direction. Sort of like making containers in Linux vs Solaris Zones. I think to be done right it needs to be built with very specific goals to begin with.
-
copec
I would actually say MS Singularity would be a good example of coming at it from the right way to begin with, but I reveal my stupidity
-
copec
^For dynamic kernel programming
-
jbk
namespaces are fine for what google designed them for (workload isolation), but the 'ikea' approach to making them secure vs starting from a secure base is I think flawed
-
copec
I wonder if Sun's JavaOS explored this same thing?
-
copec
-
copec
I'm on a random topic kick
-
copec
I would guess not quite, probably another hip microkernel thing that used protected memory for each process
-
openstandards
Hi All, just tried using an mellanox connectx-4 with smartos however I'm facing an issue with the card and it's basically throwing me into maintence mode
-
openstandards
Ended up doing modunload on the kernel module but don't know know where to go from there
-
danmcd
Odd, CX-4 should be well supported by mlxcx(4D). Any messages in /var/adm/messages* from mlxcx?
-
openstandards
danmcd: can't seem to find any messages related to mlxcx
-
openstandards
i do however have the errors that are forcing me into maintence mode
-
openstandards
mlxcx0: liminting number of rx groups to 127 based on max number of rx flow tables
-
openstandards
mlxcx0: command mlxcx0 mlxcx_op_access_reg 0x805 failed with status code mlxcx_cmd_r_bad_param
-
openstandards
mlxcx0: failed op_access_reg was for register 9009 (mtcap)
-
danmcd
Those last two are interesting. You may wish to ask about them on #illumos to see if anyone (esp. arekinath if he's around) recognizes them.
-
danmcd
There might come a point I dive into mlxcx myself for other reasons, but alas that's not now; sorry I can't be of more immediate assistance.
-
danmcd
I *do* know that CX-4 and CX-5 cards have been known to work with mlxcx(4D).
-
jbk
you might also want to check firmware versions
-
jbk
unfortunately, i don't think we currently support updating the firmware within the OS for that card
-
openstandards
Thank you both for your assistant, I'll have a go at flashing with a newer firmware tonight see if the experience improcves
-
openstandards
I'll have a go at flashing with my fedora desktop
-
openstandards
*improves
-
barfield
danmcd: I received this error in dmesg after booting a Dell FC series chassis blade (6300 model). I am not sure how I would go about complying with the erorr any advice? "2023-06-23T16:40:09.185026+00:00 48-4d-7e-58-c0-e1 i40e: [ID 517869 kern.info] NOTICE: i40e0: The driver for the device detected a newer version of the NVM image (1.12) than expecte
-
barfield
d (1.10).#012Please install the most recent version of the network driver.#012"
-
danmcd
You can ignore that unless Bits Are Not Moving. It's a safety check put in by Intel-written code.
-
danmcd
You can always attempt to update the FW on the i40e, but as with all FW updates, it can get tricky.
-
danmcd
[root@curly (kebecloud) ~]# dladm show-phys | grep i40e | grep up | wc -l
-
danmcd
3
-
danmcd
[root@curly (kebecloud) ~]# grep "The driver for the " /var/adm/messages* | wc -l
-
danmcd
24
-
danmcd
[root@curly (kebecloud) ~]#
-
barfield
Is there a utility for firmware updates on illumos? Back in the day we would boot into firmware update mode when Grub was the bootloader and that was just another boot option with freeDos bundled
-
danmcd
I wouldn't worry, again, unless BITS ARE NOT MOVING.
-
barfield
[root@48-4d-7e-58-c0-e1 ~]# grep -c i40e /var/adm/messages
-
barfield
28
-
danmcd
And if you're running X722 over BaseT, be aware of jinni illumos#13230
-
jinni
-
barfield
To my knowledge I am not
-
barfield
1 more question for you, in the current PI is dl memory managed better now on the i40e cards? I've had issues with bhyves and i40e's fighting over vmm
-
barfield
I'm wondering if anyone in here has messed around with BHYVE instances assisgned 512GB of ram or more too any input is appreciated. I haven't googled this yet so advanced apologies if BHYVE doesn't yet support this much ram.
-
barfield
my 128 and 256GB instances already take 2 solid minutes to transition to started state in vmadm lol
-
barfield
But they're lightning fast once started
-
barfield
papertigers: may have some insight on the i40e/bhyve ram fighting. I think that I remember he gave me the intel on the issue to begin with.
-
danmcd
What PI are you running now?
-
barfield
latest as of yesterday from wiki.smartos.org...vanilla usb img. Simulating workloads on some nodes that I put onsite at a customer location in an effort to sell them on ditching their metal and move into our Triton cloud.
-
barfield
09:40:09 2021
-
barfield
14:46 -!- Irssi: Join to #smartos was synced i
-
barfield
whoops
-
barfield
20230615T000418Z
-
danmcd
So release-20200730 and later contain jini illumos#12958 and illumos#12957
-
jinni
-
jinni
-
barfield
thanks jini
-
danmcd
that improves things somewhat, but if a new zone is spun up it can freeze up because the allocation for i40e VNICs does the painful part up-front.
-
barfield
Yeah I want to say it was mustachi who mentioned that part before. Though I could be wrong...CRS is in high-gear
-
danmcd
(The workaround for it is to touch a memory global in kernel using mdb -k and cross-your-fingers it picks up steam, followed by tweaking-back-to-its-default)
-
danmcd
"15:55 barfield: my 128 and 256GB instances already take 2 solid minutes to transition to started state in vmadm lol "
-
barfield
Is it something that could be updated in kernel boot options?
-
danmcd
Yeah, that's symptomatic. Tell me, is this at start time?
-
barfield
Correct
-
danmcd
Hmmm....
-
danmcd
Can you see what the ptree(1) is of the zones' procs when it's hung?
-
barfield
`vmadm start UUID-with-128GB-rm` like waiting a decade for `vmadm console` to spit stuff out
-
danmcd
Yes... but between start and visible progress, what does "ptree `pgrep -z <zone-uuid>`" say?
-
barfield
Next time I get a chance to reboot one absolutely. I am about to build a VM for the POC customer and I can do it when I get one of those booted as well. The NIC in these FC blades are i40e, but they're technically 4 10GB SFP+ interfaces. The i40e issues I've been dealing with are on native QSFP+ interfaces
-
barfield
I will get that for you, maybe today/night worst case Monday
-
danmcd
Actually -z might yield nothing as the zone isn't started fully yet. If there's a dladm(8) process running relating to the booting zone, it's more-than-likely that memory-up-front problem.
-
barfield
Ahhh so i didn't realize that the dladm issue happening during vm startup. I thought it was during boot when all of the vnics are associated
-
danmcd
New VM, new vNICs
-
barfield
How much ram typically, theoretically or best-practice, should one reserve for the globalZone if using 40 or 100GB interfaces?
-
barfield
My previous issues were bhyve's crashing/hanging
-
barfield
because I didn't have enough GZ ram reserved on the hosts
-
danmcd
I don't know off the top of my head.
-
barfield
I'm just wondering if an equivilant to wirespeed would be sufficient
-
barfield
Have 40GB NICS reserve 40GB ram
-
barfield
May be too basic lol...just thinking outloud I suppose
-
barfield
I dont know if I told anyone yet but I am no longer at Joyent
-
danmcd
It's a function of how-many-vnics are over it.
-
danmcd
So you're officially full time on your side-gig now?
-
barfield
Si Senor
-
barfield
OCP Full speed
-
danmcd
And I remember to unblock it's:
-
barfield
SmartCloud Solutions, we're rolling up into a bigger company now though
-
danmcd
echo "needfree/W0x4000" | mdb -kw
-
danmcd
<wait for things to clear>
-
barfield
Everything going good for the commercial team now that its MNX?
-
barfield
Thank you sir!
-
danmcd
echo "needfree/W0x0" | mdb -k
-
danmcd
So far so good.
-
barfield
I'll document that now in our SOP dos
-
barfield
docs*
-
danmcd
It's a big hammer, be careful using it.
-
barfield
Excellente. We're building awesome stuff and hope to share some business with Nick soon. I can't talk too much about it but SmartCloud is #3 globally under Meta and Nokia in terms of OCP organic traffic impressions. Only integrated solutions provider in the US and only provider certified on SmartOS
-
barfield
OCP Inspired+OCP Certified is the proper term. I saw that Oxide was giving tours during the OCP World Summit last October; AFTER I landed back in Dallas. I was bummed
-
barfield
anyway, thanks for your assistance danmcd:. Gonna go back to manually provisioning VM's lol
-
barfield
VPN Disconnected on me and kicked me off of IRC. So the question is: Is this a transient message or is it something I should investigate in the BIOS? "WARNING: Couldn't read ACPI SRAT table from BIOS. lgrp support will be limited to one group."
-
rmustacc
Is it a 1P or 2P server?
-
barfield
I am actually not sure that I am familiar with the term. Let me google that real quick
-
rmustacc
How many sockets.
-
rmustacc
*?
-
barfield
2
-
barfield
Intel E5-2667 I believe
-
rmustacc
The original E5-2667?
-
barfield
Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz <- from sysinfo
-
barfield
System Configuration: Dell Inc. PowerEdge FC630
-
rmustacc
Then yes. There's probably a BIOS entry that's hiding that and not admitting NUMA-ness.
-
rmustacc
That was from around that era of system.
-
barfield
I noticed a menu when I was in the BIOS last night that let you tweak the ACPI settings but i didn't navigate to it. Will check tonight. Thank you and apologies for mispelling your name earlier with Dan haha
-
barfield
One last thing
-
barfield
in my experience with FreeBSD NUMA gave significant performance enhancements in packet capture applications directly access NIC ring buffers...is there a significant gain in performance in the hypervisor world with NUMA? May sound silly but I just haven't spent enough time learning about NUMA in this arena
-
rmustacc
So I think the better way to think about it is that your system design is NUMA inherently in this case.
-
barfield
Ah, so not using it in and of itself introduces a lack of performance
-
rmustacc
When you have a 2 socket server, DRAM is attached to a given CPU and otherwise it has to go across the QPI bus (given this paritcular CPU).
-
rmustacc
Well, it's more a statement of reality versus anything else.
-
rmustacc
Some CPU cores are closer to memory, some are further from it.
-
barfield
Yes I understand this from the NIC IRQ mapping in FreeBSD
-
rmustacc
PCIe devices are going to be closer to some memory and further from others. The cost of being close and far will vary.
-
barfield
interrupts*
-
rmustacc
So depending on VM size and memory needs, splitting them across the numa domains can help.
-
rmustacc
But there's a lot of caveats and depends on how firmware actually sets up memory interleaving.
-
barfield
Excellent!
-
rmustacc
In illumos there is a notin of things called 'locality groups' or lgrps.
-
rmustacc
Similarly subsets of a single socket can be further broken down into processor groups or pgs.
-
rmustacc
You can see what's here with lgrpinfo and pginfo. On i86pc this information is communicated to the OS through series of ACPI tables.
-
barfield
Thank you Sir!
-
barfield
Much appreciated. That output explains alot
-
barfield
I keep hitting what I believe to be a bug in bhyve...
-
barfield
vm goes into down state...previously I was able to confirm a zombie process remained and forced me to reboot and I had no errors in logs
-
barfield
Today on a bhyve VM with 220GB ram I ran `vmadm kill -s 9 UUID`. It worked, vm went down, but then stayed in "down" state and will not transition to stopped state. Platform log has munmap_memseg failed error. here is the last few lines of platform.log
-
barfield
-
barfield
I do wonder if it is related to the NUMA/AHCI error I posted and discussed with Robert about earlier in the chat
-
barfield
Wow now I'm getting this: WARNING: /pci@0,0/pci1028,61b@1a (ehci0): No SOF interrupts have been received, this USB EHCI hostcontroller is unusable
-
barfield
FC series is interesting to say the least lol
-
barfield
I'm leaving for the day just wanted to post all of these things that I keep running into before I drive off. Have a great weekend if no comes online before monday