00:31:52 <Smithx10> copec:  I explored fast nat a bit..... and its pretty hilarious how fast that rabbit hole goes sad face lol 
00:38:53 <copec> Yeah. I had to upgrade my old home server to do 1Gb and saturating the link pretty much eats all the CPU of it. So I'll have to probably get a real router whenever I upgrade from that
00:44:10 <danmcd> You might wanna try the BHYVE appliance approach first.  (Unless @Smithx10 you ARE talking about a bhyve-based NAT...)
00:45:00 <danmcd> There's potential to build a better NAT for illumos.  A very larval approach (that focusses on vxlan internal nets) is in the `nat-reform` branch of illumos-joyent.
00:45:30 <danmcd> (tl;dr ==> Use conn_t infrastructure for NAT sessions, rewire conn_recv & conn_xmit, etc. etc.)
00:46:18 <copec> I tried opnsense, and it is really nice, but it was only slightly ahead the illumos nat in cpu usage
00:47:41 <copec> I wish I was an experienced c dev and could jump into something like that (or Rust, are we using any Rust in the kernel?)
00:49:10 * copec checks out nat-reform branch
00:53:49 <Smithx10> danmcd:  I was looking at if there was a eBPF firewall on linux that did NAT  (there wasnt anything concrete i found) and then it goes into Hardware Offloaded Conn Tracking in Mellanox cards lol
00:54:02 <Smithx10> I dont have that kind of need for NAT speed lol 
01:37:04 <danmcd> copec: --> yeah, pfsense or opensense was what I was hearing about.
01:37:25 <danmcd> copec: the `vxlnat` sources is where you should look.
01:57:57 <copec> I was just reading this from a few years ago https://www.crowdsupply.com/traverse-technologies/ten64/updates/10g-options-and-performance and saw this https://forum.traverse.com.au/t/freebsd-preview-for-ten64/173
01:59:12 <jbk> speaking of eBPF...
01:59:13 <jbk> https://twitter.com/tianyin_xu/status/1671857283263868930
02:09:34 <copec> eBPF is really neat, but it is sort of crawling at something from the wrong direction. Sort of like making containers in Linux vs Solaris Zones. I think to be done right it needs to be built with very specific goals to begin with.
02:14:57 <copec> I would actually say MS Singularity would be a good example of coming at it from the right way to begin with, but I reveal my stupidity 
02:15:48 <copec> ^For dynamic kernel programming
02:16:54 <jbk> namespaces are fine for what google designed them for (workload isolation), but the 'ikea' approach to making them secure vs starting from a secure base is I think flawed
02:19:58 <copec> I wonder if Sun's JavaOS explored this same thing?
02:22:08 <copec> https://en.wikipedia.org/wiki/JavaOS#Overview 
02:22:12 <copec> I'm on a random topic kick
02:26:09 <copec> I would guess not quite, probably another hip microkernel thing that used protected memory for each process
13:54:38 <openstandards> Hi All, just tried using an mellanox connectx-4 with smartos however I'm facing an issue with the card and it's basically throwing me into maintence mode
13:55:46 <openstandards> Ended up doing modunload on the kernel module but don't know know where to go from there
14:40:38 <danmcd> Odd, CX-4 should be well supported by mlxcx(4D).  Any messages in /var/adm/messages* from mlxcx?
15:05:43 <openstandards> danmcd: can't seem to find any messages related to mlxcx
15:06:27 <openstandards> i do however have the errors that are forcing me into maintence mode
15:07:19 <openstandards> mlxcx0: liminting number of rx groups to 127 based on max number of rx flow tables
15:08:57 <openstandards> mlxcx0: command mlxcx0  mlxcx_op_access_reg 0x805 failed with status code mlxcx_cmd_r_bad_param 
15:09:30 <openstandards> mlxcx0: failed op_access_reg was for register 9009 (mtcap)
15:15:56 <danmcd> Those last two are interesting.  You may wish to ask about them on #illumos to see if anyone (esp. arekinath if he's around) recognizes them.
15:16:20 <danmcd> There might come a point I dive into mlxcx myself for other reasons, but alas that's not now; sorry I can't be of more immediate assistance.
15:16:36 <danmcd> I *do* know that CX-4 and CX-5 cards have been known to work with mlxcx(4D).
15:17:31 <jbk> you might also want to check firmware versions
15:24:32 <jbk> unfortunately, i don't think we currently support updating the firmware within the OS for that card
15:24:47 <openstandards> Thank you both for your assistant, I'll have a go at flashing with a newer firmware tonight see if the experience improcves 
15:25:22 <openstandards> I'll have a go at flashing with my fedora desktop 
15:25:30 <openstandards> *improves 
19:48:53 <barfield> danmcd: I received this error in dmesg after booting a Dell FC series chassis blade (6300 model). I am not sure how I would go about complying with the erorr any advice? "2023-06-23T16:40:09.185026+00:00 48-4d-7e-58-c0-e1 i40e: [ID 517869 kern.info] NOTICE: i40e0: The driver for the device detected a newer version of the NVM image (1.12) than expecte
19:48:58 <barfield> d (1.10).#012Please install the most recent version of the network driver.#012"
19:49:34 <danmcd> You can ignore that unless Bits Are Not Moving.  It's a safety check put in by Intel-written code.
19:49:55 <danmcd> You can always attempt to update the FW on the i40e, but as with all FW updates, it can get tricky.
19:50:50 <danmcd> [root@curly (kebecloud) ~]# dladm show-phys | grep i40e | grep up | wc -l
19:50:50 <danmcd>        3
19:50:50 <danmcd> [root@curly (kebecloud) ~]# grep "The driver for the " /var/adm/messages* | wc -l
19:50:52 <danmcd>       24
19:50:54 <danmcd> [root@curly (kebecloud) ~]# 
19:50:55 <barfield> Is there a utility for firmware updates on illumos? Back in the day we would boot into firmware update mode when Grub was the bootloader and that was just another boot option with freeDos bundled
19:51:11 <danmcd> I wouldn't worry, again, unless BITS ARE NOT MOVING.
19:51:25 <barfield> [root@48-4d-7e-58-c0-e1 ~]# grep -c i40e /var/adm/messages
19:51:25 <barfield> 28
19:51:29 <danmcd> And if you're running X722 over BaseT, be aware of jinni illumos#13230
19:51:30 <jinni> https://www.illumos.org/issues/13230
19:51:49 <barfield> To my knowledge I am not
19:52:30 <barfield> 1 more question for you, in the current PI is dl memory managed better now on the i40e cards? I've had issues with bhyves and i40e's fighting over vmm
19:55:02 <barfield> I'm wondering if anyone in here has messed around with BHYVE instances assisgned 512GB of ram or more too any input is appreciated. I haven't googled this yet so advanced apologies if BHYVE doesn't yet support this much ram. 
19:55:41 <barfield> my 128 and 256GB instances already take 2 solid minutes to transition to started state in vmadm lol
19:55:57 <barfield> But they're lightning fast once started
19:57:02 <barfield> papertigers: may have some insight on the i40e/bhyve ram fighting. I think that I remember he gave me the intel on the issue to begin with. 
19:57:02 <danmcd> What PI are you running now?
19:57:44 <barfield> latest as of yesterday from wiki.smartos.org...vanilla usb img. Simulating workloads on some nodes that I put onsite at a customer location in an effort to sell them on ditching their metal and move into our Triton cloud. 
19:58:05 <barfield> 09:40:09 2021
19:58:05 <barfield> 14:46 -!- Irssi: Join to #smartos was synced i
19:58:10 <barfield> whoops
19:58:18 <barfield> 20230615T000418Z
19:58:43 <danmcd> So release-20200730 and later contain jini illumos#12958 and illumos#12957
19:58:44 <jinni> https://www.illumos.org/issues/12957
19:58:44 <jinni> https://www.illumos.org/issues/12958
19:59:18 <barfield> thanks jini
19:59:28 <danmcd> that improves things somewhat, but if a new zone is spun up it can freeze up because the allocation for i40e VNICs does the painful part up-front.
20:00:23 <barfield> Yeah I want to say it was mustachi who mentioned that part before. Though I could be wrong...CRS is in high-gear
20:00:26 <danmcd> (The workaround for it is to touch a memory global in kernel using mdb -k and cross-your-fingers it picks up steam, followed by tweaking-back-to-its-default)
20:00:45 <danmcd> "15:55 barfield: my 128 and 256GB instances already take 2 solid minutes to transition to started state in vmadm lol "
20:00:59 <barfield> Is it something that could be updated in kernel boot options?
20:00:59 <danmcd> Yeah, that's symptomatic.  Tell me, is this at start time?
20:01:08 <barfield> Correct
20:01:11 <danmcd> Hmmm....
20:01:28 <danmcd> Can you see what the ptree(1) is of the zones' procs when it's hung?
20:01:32 <barfield> `vmadm start UUID-with-128GB-rm` like waiting a decade for `vmadm console` to spit stuff out
20:02:13 <danmcd> Yes... but between start and visible progress, what does "ptree `pgrep -z <zone-uuid>`" say?
20:02:42 <barfield> Next time I get a chance to reboot one absolutely. I am about to build a VM for the POC customer and I can do it when I get one of those booted as well. The NIC in these FC blades are i40e, but they're technically 4 10GB SFP+ interfaces. The i40e issues I've been dealing with are on native QSFP+ interfaces
20:03:03 <barfield> I will get that for you, maybe today/night worst case Monday
20:03:05 <danmcd> Actually -z might yield nothing as the zone isn't started fully yet.  If there's a dladm(8) process running relating to the booting zone, it's more-than-likely that memory-up-front problem.
20:03:49 <barfield> Ahhh so i didn't realize that the dladm issue happening during vm startup. I thought it was during boot when all of the vnics are associated 
20:04:04 <danmcd> New VM, new vNICs
20:04:33 <barfield> How much ram typically, theoretically or best-practice, should one reserve for the globalZone if using 40 or 100GB interfaces?
20:05:01 <barfield> My previous issues were bhyve's crashing/hanging 
20:05:11 <barfield> because I didn't have enough GZ ram reserved on the hosts
20:06:33 <danmcd> I don't know off the top of my head.
20:07:15 <barfield> I'm just wondering if an equivilant to wirespeed would be sufficient
20:07:24 <barfield> Have 40GB NICS reserve 40GB ram
20:07:44 <barfield> May be too basic lol...just thinking outloud I suppose
20:08:10 <barfield> I dont know if I told anyone yet but I am no longer at Joyent
20:08:15 <danmcd> It's a function of how-many-vnics are over it.
20:08:22 <danmcd> So you're officially full time on your side-gig now?
20:08:27 <barfield> Si Senor
20:08:30 <barfield> OCP Full speed
20:08:33 <danmcd> And I remember to unblock it's:
20:08:52 <barfield> SmartCloud Solutions, we're rolling up into a bigger company now though
20:09:47 <danmcd>       echo "needfree/W0x4000" | mdb -kw
20:09:52 <danmcd>      <wait for things to clear>
20:10:02 <barfield> Everything going good for the commercial team now that its MNX?
20:10:04 <barfield> Thank you sir!
20:10:05 <danmcd>        echo "needfree/W0x0" | mdb -k
20:10:11 <danmcd> So far so good.
20:10:13 <barfield> I'll document that now in our SOP dos
20:10:16 <barfield> docs*
20:12:02 <danmcd> It's a big hammer, be careful using it.
20:12:17 <barfield> Excellente. We're building awesome stuff and hope to share some business with Nick soon. I can't talk too much about it but SmartCloud is #3 globally under Meta and Nokia in terms of OCP organic traffic impressions. Only integrated solutions provider in the US and only provider certified on SmartOS
20:13:52 <barfield> OCP Inspired+OCP Certified is the proper term. I saw that Oxide was giving tours during the OCP World Summit last October; AFTER I landed back in Dallas. I was bummed
20:14:46 <barfield> anyway, thanks for your assistance danmcd:. Gonna go back to manually provisioning VM's lol
20:45:37 <barfield> VPN Disconnected on me and kicked me off of IRC. So the question is: Is this a transient message or is it something I should investigate in the BIOS? "WARNING: Couldn't read ACPI SRAT table from BIOS. lgrp support will be limited to one group."
20:46:07 <rmustacc> Is it a 1P or 2P server?
20:46:43 <barfield> I am actually not sure that I am familiar with the term. Let me google that real quick
20:46:48 <rmustacc> How many sockets.
20:46:50 <rmustacc> *?
20:46:52 <barfield> 2
20:47:06 <barfield> Intel E5-2667 I believe
20:47:26 <rmustacc> The original E5-2667?
20:48:02 <barfield> Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz <- from sysinfo
20:49:00 <barfield> System Configuration: Dell Inc. PowerEdge FC630
20:49:31 <rmustacc> Then yes. There's probably a BIOS entry that's hiding that and not admitting NUMA-ness.
20:49:37 <rmustacc> That was from around that era of system.
20:50:15 <barfield> I noticed a menu when I was in the BIOS last night that let you tweak the ACPI settings but i didn't navigate to it. Will check tonight. Thank you and apologies for mispelling your name earlier with Dan haha
20:50:41 <barfield> One last thing
20:52:02 <barfield> in my experience with FreeBSD NUMA gave significant performance enhancements in packet capture applications directly access NIC ring buffers...is there a significant gain in performance in the hypervisor world with NUMA? May sound silly but I just haven't spent enough time learning about NUMA in this arena
20:53:54 <rmustacc> So I think the better way to think about it is that your system design is NUMA inherently in this case.
20:54:13 <barfield> Ah, so not using it in and of itself introduces a lack of performance
20:54:13 <rmustacc> When you have a 2 socket server, DRAM is attached to a given CPU and otherwise it has to go across the QPI bus (given this paritcular CPU).
20:54:31 <rmustacc> Well, it's more a statement of reality versus anything else.
20:54:38 <rmustacc> Some CPU cores are closer to memory, some are further from it.
20:54:55 <barfield> Yes I understand this from the NIC IRQ mapping in FreeBSD
20:55:00 <rmustacc> PCIe devices are going to be closer to some memory and further from others. The cost of being close and far will vary.
20:55:07 <barfield> interrupts*
20:55:44 <rmustacc> So depending on VM size and memory needs, splitting them across the numa domains can help.
20:55:56 <rmustacc> But there's a lot of caveats and depends on how firmware actually sets up memory interleaving.
20:55:58 <barfield> Excellent!
20:56:13 <rmustacc> In illumos there is a notin of things called 'locality groups' or lgrps.
20:56:37 <rmustacc> Similarly subsets of a single socket can be further broken down into processor groups or pgs.
20:56:57 <rmustacc> You can see what's here with lgrpinfo and pginfo. On i86pc this information is communicated to the OS through series of ACPI tables.
21:02:02 <barfield> Thank you Sir!
21:02:11 <barfield> Much appreciated. That output explains alot
21:32:26 <barfield> I keep hitting what I believe to be a bug in bhyve...
21:32:48 <barfield> vm goes into down state...previously I was able to confirm a zombie process remained and forced me to reboot and I had no errors in logs
21:34:02 <barfield> Today on a bhyve VM with 220GB ram I ran `vmadm kill -s 9 UUID`. It worked, vm went down, but then stayed in "down" state and will not transition to stopped state. Platform log has munmap_memseg failed error. here is the last few lines of platform.log 
21:34:06 <barfield> https://pastebin.com/Sk6JLsS3
21:36:00 <barfield> I do wonder if it is related to the NUMA/AHCI error I posted and discussed with Robert about earlier in the chat
21:37:00 <barfield> Wow now I'm getting this: WARNING: /pci@0,0/pci1028,61b@1a (ehci0): No SOF interrupts have been received, this USB EHCI hostcontroller is unusable
21:37:14 <barfield> FC series is interesting to say the least lol
21:37:46 <barfield> I'm leaving for the day just wanted to post all of these things that I keep running into before I drive off. Have a great weekend if no comes online before monday