00:31:52 copec: I explored fast nat a bit..... and its pretty hilarious how fast that rabbit hole goes sad face lol 00:38:53 Yeah. I had to upgrade my old home server to do 1Gb and saturating the link pretty much eats all the CPU of it. So I'll have to probably get a real router whenever I upgrade from that 00:44:10 You might wanna try the BHYVE appliance approach first. (Unless @Smithx10 you ARE talking about a bhyve-based NAT...) 00:45:00 There's potential to build a better NAT for illumos. A very larval approach (that focusses on vxlan internal nets) is in the `nat-reform` branch of illumos-joyent. 00:45:30 (tl;dr ==> Use conn_t infrastructure for NAT sessions, rewire conn_recv & conn_xmit, etc. etc.) 00:46:18 I tried opnsense, and it is really nice, but it was only slightly ahead the illumos nat in cpu usage 00:47:41 I wish I was an experienced c dev and could jump into something like that (or Rust, are we using any Rust in the kernel?) 00:49:10 * copec checks out nat-reform branch 00:53:49 danmcd: I was looking at if there was a eBPF firewall on linux that did NAT (there wasnt anything concrete i found) and then it goes into Hardware Offloaded Conn Tracking in Mellanox cards lol 00:54:02 I dont have that kind of need for NAT speed lol 01:37:04 copec: --> yeah, pfsense or opensense was what I was hearing about. 01:37:25 copec: the `vxlnat` sources is where you should look. 01:57:57 I was just reading this from a few years ago https://www.crowdsupply.com/traverse-technologies/ten64/updates/10g-options-and-performance and saw this https://forum.traverse.com.au/t/freebsd-preview-for-ten64/173 01:59:12 speaking of eBPF... 01:59:13 https://twitter.com/tianyin_xu/status/1671857283263868930 02:09:34 eBPF is really neat, but it is sort of crawling at something from the wrong direction. Sort of like making containers in Linux vs Solaris Zones. I think to be done right it needs to be built with very specific goals to begin with. 02:14:57 I would actually say MS Singularity would be a good example of coming at it from the right way to begin with, but I reveal my stupidity 02:15:48 ^For dynamic kernel programming 02:16:54 namespaces are fine for what google designed them for (workload isolation), but the 'ikea' approach to making them secure vs starting from a secure base is I think flawed 02:19:58 I wonder if Sun's JavaOS explored this same thing? 02:22:08 https://en.wikipedia.org/wiki/JavaOS#Overview 02:22:12 I'm on a random topic kick 02:26:09 I would guess not quite, probably another hip microkernel thing that used protected memory for each process 13:54:38 Hi All, just tried using an mellanox connectx-4 with smartos however I'm facing an issue with the card and it's basically throwing me into maintence mode 13:55:46 Ended up doing modunload on the kernel module but don't know know where to go from there 14:40:38 Odd, CX-4 should be well supported by mlxcx(4D). Any messages in /var/adm/messages* from mlxcx? 15:05:43 danmcd: can't seem to find any messages related to mlxcx 15:06:27 i do however have the errors that are forcing me into maintence mode 15:07:19 mlxcx0: liminting number of rx groups to 127 based on max number of rx flow tables 15:08:57 mlxcx0: command mlxcx0 mlxcx_op_access_reg 0x805 failed with status code mlxcx_cmd_r_bad_param 15:09:30 mlxcx0: failed op_access_reg was for register 9009 (mtcap) 15:15:56 Those last two are interesting. You may wish to ask about them on #illumos to see if anyone (esp. arekinath if he's around) recognizes them. 15:16:20 There might come a point I dive into mlxcx myself for other reasons, but alas that's not now; sorry I can't be of more immediate assistance. 15:16:36 I *do* know that CX-4 and CX-5 cards have been known to work with mlxcx(4D). 15:17:31 you might also want to check firmware versions 15:24:32 unfortunately, i don't think we currently support updating the firmware within the OS for that card 15:24:47 Thank you both for your assistant, I'll have a go at flashing with a newer firmware tonight see if the experience improcves 15:25:22 I'll have a go at flashing with my fedora desktop 15:25:30 *improves 19:48:53 danmcd: I received this error in dmesg after booting a Dell FC series chassis blade (6300 model). I am not sure how I would go about complying with the erorr any advice? "2023-06-23T16:40:09.185026+00:00 48-4d-7e-58-c0-e1 i40e: [ID 517869 kern.info] NOTICE: i40e0: The driver for the device detected a newer version of the NVM image (1.12) than expecte 19:48:58 d (1.10).#012Please install the most recent version of the network driver.#012" 19:49:34 You can ignore that unless Bits Are Not Moving. It's a safety check put in by Intel-written code. 19:49:55 You can always attempt to update the FW on the i40e, but as with all FW updates, it can get tricky. 19:50:50 [root@curly (kebecloud) ~]# dladm show-phys | grep i40e | grep up | wc -l 19:50:50 3 19:50:50 [root@curly (kebecloud) ~]# grep "The driver for the " /var/adm/messages* | wc -l 19:50:52 24 19:50:54 [root@curly (kebecloud) ~]# 19:50:55 Is there a utility for firmware updates on illumos? Back in the day we would boot into firmware update mode when Grub was the bootloader and that was just another boot option with freeDos bundled 19:51:11 I wouldn't worry, again, unless BITS ARE NOT MOVING. 19:51:25 [root@48-4d-7e-58-c0-e1 ~]# grep -c i40e /var/adm/messages 19:51:25 28 19:51:29 And if you're running X722 over BaseT, be aware of jinni illumos#13230 19:51:30 https://www.illumos.org/issues/13230 19:51:49 To my knowledge I am not 19:52:30 1 more question for you, in the current PI is dl memory managed better now on the i40e cards? I've had issues with bhyves and i40e's fighting over vmm 19:55:02 I'm wondering if anyone in here has messed around with BHYVE instances assisgned 512GB of ram or more too any input is appreciated. I haven't googled this yet so advanced apologies if BHYVE doesn't yet support this much ram. 19:55:41 my 128 and 256GB instances already take 2 solid minutes to transition to started state in vmadm lol 19:55:57 But they're lightning fast once started 19:57:02 papertigers: may have some insight on the i40e/bhyve ram fighting. I think that I remember he gave me the intel on the issue to begin with. 19:57:02 What PI are you running now? 19:57:44 latest as of yesterday from wiki.smartos.org...vanilla usb img. Simulating workloads on some nodes that I put onsite at a customer location in an effort to sell them on ditching their metal and move into our Triton cloud. 19:58:05 09:40:09 2021 19:58:05 14:46 -!- Irssi: Join to #smartos was synced i 19:58:10 whoops 19:58:18 20230615T000418Z 19:58:43 So release-20200730 and later contain jini illumos#12958 and illumos#12957 19:58:44 https://www.illumos.org/issues/12957 19:58:44 https://www.illumos.org/issues/12958 19:59:18 thanks jini 19:59:28 that improves things somewhat, but if a new zone is spun up it can freeze up because the allocation for i40e VNICs does the painful part up-front. 20:00:23 Yeah I want to say it was mustachi who mentioned that part before. Though I could be wrong...CRS is in high-gear 20:00:26 (The workaround for it is to touch a memory global in kernel using mdb -k and cross-your-fingers it picks up steam, followed by tweaking-back-to-its-default) 20:00:45 "15:55 barfield: my 128 and 256GB instances already take 2 solid minutes to transition to started state in vmadm lol " 20:00:59 Is it something that could be updated in kernel boot options? 20:00:59 Yeah, that's symptomatic. Tell me, is this at start time? 20:01:08 Correct 20:01:11 Hmmm.... 20:01:28 Can you see what the ptree(1) is of the zones' procs when it's hung? 20:01:32 `vmadm start UUID-with-128GB-rm` like waiting a decade for `vmadm console` to spit stuff out 20:02:13 Yes... but between start and visible progress, what does "ptree `pgrep -z `" say? 20:02:42 Next time I get a chance to reboot one absolutely. I am about to build a VM for the POC customer and I can do it when I get one of those booted as well. The NIC in these FC blades are i40e, but they're technically 4 10GB SFP+ interfaces. The i40e issues I've been dealing with are on native QSFP+ interfaces 20:03:03 I will get that for you, maybe today/night worst case Monday 20:03:05 Actually -z might yield nothing as the zone isn't started fully yet. If there's a dladm(8) process running relating to the booting zone, it's more-than-likely that memory-up-front problem. 20:03:49 Ahhh so i didn't realize that the dladm issue happening during vm startup. I thought it was during boot when all of the vnics are associated 20:04:04 New VM, new vNICs 20:04:33 How much ram typically, theoretically or best-practice, should one reserve for the globalZone if using 40 or 100GB interfaces? 20:05:01 My previous issues were bhyve's crashing/hanging 20:05:11 because I didn't have enough GZ ram reserved on the hosts 20:06:33 I don't know off the top of my head. 20:07:15 I'm just wondering if an equivilant to wirespeed would be sufficient 20:07:24 Have 40GB NICS reserve 40GB ram 20:07:44 May be too basic lol...just thinking outloud I suppose 20:08:10 I dont know if I told anyone yet but I am no longer at Joyent 20:08:15 It's a function of how-many-vnics are over it. 20:08:22 So you're officially full time on your side-gig now? 20:08:27 Si Senor 20:08:30 OCP Full speed 20:08:33 And I remember to unblock it's: 20:08:52 SmartCloud Solutions, we're rolling up into a bigger company now though 20:09:47 echo "needfree/W0x4000" | mdb -kw 20:09:52 20:10:02 Everything going good for the commercial team now that its MNX? 20:10:04 Thank you sir! 20:10:05 echo "needfree/W0x0" | mdb -k 20:10:11 So far so good. 20:10:13 I'll document that now in our SOP dos 20:10:16 docs* 20:12:02 It's a big hammer, be careful using it. 20:12:17 Excellente. We're building awesome stuff and hope to share some business with Nick soon. I can't talk too much about it but SmartCloud is #3 globally under Meta and Nokia in terms of OCP organic traffic impressions. Only integrated solutions provider in the US and only provider certified on SmartOS 20:13:52 OCP Inspired+OCP Certified is the proper term. I saw that Oxide was giving tours during the OCP World Summit last October; AFTER I landed back in Dallas. I was bummed 20:14:46 anyway, thanks for your assistance danmcd:. Gonna go back to manually provisioning VM's lol 20:45:37 VPN Disconnected on me and kicked me off of IRC. So the question is: Is this a transient message or is it something I should investigate in the BIOS? "WARNING: Couldn't read ACPI SRAT table from BIOS. lgrp support will be limited to one group." 20:46:07 Is it a 1P or 2P server? 20:46:43 I am actually not sure that I am familiar with the term. Let me google that real quick 20:46:48 How many sockets. 20:46:50 *? 20:46:52 2 20:47:06 Intel E5-2667 I believe 20:47:26 The original E5-2667? 20:48:02 Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz <- from sysinfo 20:49:00 System Configuration: Dell Inc. PowerEdge FC630 20:49:31 Then yes. There's probably a BIOS entry that's hiding that and not admitting NUMA-ness. 20:49:37 That was from around that era of system. 20:50:15 I noticed a menu when I was in the BIOS last night that let you tweak the ACPI settings but i didn't navigate to it. Will check tonight. Thank you and apologies for mispelling your name earlier with Dan haha 20:50:41 One last thing 20:52:02 in my experience with FreeBSD NUMA gave significant performance enhancements in packet capture applications directly access NIC ring buffers...is there a significant gain in performance in the hypervisor world with NUMA? May sound silly but I just haven't spent enough time learning about NUMA in this arena 20:53:54 So I think the better way to think about it is that your system design is NUMA inherently in this case. 20:54:13 Ah, so not using it in and of itself introduces a lack of performance 20:54:13 When you have a 2 socket server, DRAM is attached to a given CPU and otherwise it has to go across the QPI bus (given this paritcular CPU). 20:54:31 Well, it's more a statement of reality versus anything else. 20:54:38 Some CPU cores are closer to memory, some are further from it. 20:54:55 Yes I understand this from the NIC IRQ mapping in FreeBSD 20:55:00 PCIe devices are going to be closer to some memory and further from others. The cost of being close and far will vary. 20:55:07 interrupts* 20:55:44 So depending on VM size and memory needs, splitting them across the numa domains can help. 20:55:56 But there's a lot of caveats and depends on how firmware actually sets up memory interleaving. 20:55:58 Excellent! 20:56:13 In illumos there is a notin of things called 'locality groups' or lgrps. 20:56:37 Similarly subsets of a single socket can be further broken down into processor groups or pgs. 20:56:57 You can see what's here with lgrpinfo and pginfo. On i86pc this information is communicated to the OS through series of ACPI tables. 21:02:02 Thank you Sir! 21:02:11 Much appreciated. That output explains alot 21:32:26 I keep hitting what I believe to be a bug in bhyve... 21:32:48 vm goes into down state...previously I was able to confirm a zombie process remained and forced me to reboot and I had no errors in logs 21:34:02 Today on a bhyve VM with 220GB ram I ran `vmadm kill -s 9 UUID`. It worked, vm went down, but then stayed in "down" state and will not transition to stopped state. Platform log has munmap_memseg failed error. here is the last few lines of platform.log 21:34:06 https://pastebin.com/Sk6JLsS3 21:36:00 I do wonder if it is related to the NUMA/AHCI error I posted and discussed with Robert about earlier in the chat 21:37:00 Wow now I'm getting this: WARNING: /pci@0,0/pci1028,61b@1a (ehci0): No SOF interrupts have been received, this USB EHCI hostcontroller is unusable 21:37:14 FC series is interesting to say the least lol 21:37:46 I'm leaving for the day just wanted to post all of these things that I keep running into before I drive off. Have a great weekend if no comes online before monday