07:14:11 Nice article about SMF: https://www.davepacheco.net/blog/2026/smf-properties/ 15:09:56 FreeBSD just removed le(4D) support yesterday. I'm pretty sure we ripped it out during Solaris 10 bringup. 15:11:44 oh that's a blast from the past.. 15:12:45 and the infamous ce 15:13:57 le was the onboard chip for late Sun-3s and the SPARCstations 1 & 2. 15:14:19 IIRC my first UltraSPARC I workstation at Sun was an early model that also had le on it. 15:14:31 https://illumos.org/opensolaris/bugdb/bug.html#!4942766 15:14:32 → OpenSolaris issue 4942766: Remove le driver from ON10 (Closed) 15:14:34 I'm trying to remember what was on the sparc 5 & 20 15:14:56 the two sun boxes i ever got access to :) 15:18:02 https://en.wikipedia.org/wiki/SPARCstation_20 says `le`. 15:18:18 ok.. that sounds right.. 15:18:54 they were running solaris 2.4 at the time and at one point I believe disksuite was installed (uugh) 15:20:17 (I greatly disliked the admin interface of disksuite.. while maybe not admin hostile, it was certainly admin indifferent :P) 15:23:06 naming is important, and just giving a user an arbitrary number (or these days, guid) is just a giant FU IMO 19:31:57 almost all the "classic" Sun machines were the lance ethernet 19:32:22 and optional weirdness like atm and fddi 19:33:19 I have this vague memory that le v. hme was the difference between the "Ultra 1" and "Ultra 2" and the "Ultra 1 Enterprise" and "Ultra 2 Enterprise" 19:33:21 along with a framebuffer 20:09:42 yeah, it was early in S10: 20:09:44 PSARC 2003/335 EOL of le Ethernet driver 20:09:44 4942766 Remove le driver from ON10 20:10:36 those machines would never run 64bit, so on10 was toxic for them eventually anyway 20:10:50 they had that one bug nobody ever explains 20:16:13 and this many years later, there may be no one left who remembers the details 20:27:03 was 64-bit the 'prize' for the happy meal? :) 20:34:34 heh... and the choice of reusing EBADE instead of just added a new error code for zfs continues to cause confusion :) 21:42:11 any HBA-driver experts present? My newly imaged OmniOS host is setting the same problem report as https://illumos.topicbox.com/groups/developer/Tc07685e55bd13e0d-M68f6ceb0f3e2a1cf3bbeb89d and I'm curious if it is really ignoreable or if I can/need to do something to fix it. 21:43:02 In my case the complaint is "Mar 3 12:17:25 fs2 scsi: [ID 107833 kern.warning] WARNING: /pci@95,0/pci8086,352c@5/pci1000,4060@0 (mpt_sas0):#012#011Number of phys reported by HBA SAS IO Unit Page 0 (11) is greater than that reported by the manufacturing information (8). Driver phy count limited to 8. Please contact the firmware vendor about this." 21:43:34 value='SAS3808ALLHBA 9500-8i03-50134-01004SPF3001010' 21:43:35 value='HBA 9500-8i' 21:43:35 value='MPTSAS HBA Driver 00.00.00.24' 21:43:35 value='9500-8i Tri-Mode HBA' 21:44:11 it seems like it's saying there's two ways to get that value, and they give different answers, we picked the smaller, but you should ask LSI's desecendents to make it not do that 21:44:16 which seems like it's trying to imply you're ok 21:44:32 I'm not an expert 21:45:18 * ENOMAD nods 21:45:50 My 'concern' (said gently) is the "count limited to 8" part. This host could eventually have up to 36 SAS devices connected. Right now we only have 11. 21:49:18 rmustacc: is there a reason we couldn't convert pci_boot.c to use the busra.c interfaces (ndi_ra_XXX)? 21:50:01 (it seems like it'd be nicer, and seems like it'd allow most of the code to not care about the PCI segment it's on) 22:45:21 so I'm trying to understand where mblks can get queued between a NIC driver (specifically i40e) and a tcp socket. there's the receive ring, then there are soft rings in mac and then squeues entering ip. anywhere else? (I'm trying to figure out how a single active tcp connection that's coming from a 1gbit/s link and being actively read by the reciever can cause the i40e driver to run out of receive buffers after loaning out ~1024 of them to 22:45:21 mac and points downstream..) 22:46:32 suggests to me that something is causing large packet batches to accumulate somewhere along the pipeline. 22:52:37 rzezeski: this sounds like something you know 22:53:36 one thing I've thought about but haven't dug in too deeply to see how difficult it'd be is for mblk_ts going upstack that are being loaned up, to copy and release the original mblk_ts if processing gets deferred for some reason 22:54:51 since the loaned up resources are often shared amongst multiple 'streams' (tcp connections/etc), so can potentially hog loaned out resources from down stack 22:55:11 i saw this in a bug I was never able to entirely chase down with inter-zone traffic on the same box 22:59:57 the specific thing I'm chasing is that with a 4M tcp window, throughput sucks on most connections (~200Mbits/s); with a 500k window it goes at 9xx Mbit/s (gigabit-ish line rate). 23:01:36 my working theory is that there are the standing waves building up *somewhere* and when the sender fills the tcp window it gives a chance for the receiver to drain and stay caught up. 23:03:40 jbk: I think the hard part is that there are potentially so many places for mblks to get queued that knowing where to look is half the battle.. 23:05:55 ENOMAD: so how are things cabled up? expanders, or multiple 9500's? 23:07:23 sommerfeld, single 9500 23:07:39 I presume expanders. I uploaded the prtconf to the ticket I just opened. 23:11:32 so it's probably at that point only counting the 8 ports on the 9500 and not the other ports on the expander(s) plugged into some of those ports 23:13:00 note that it says "phy count" not something like 'target count' 23:13:43 hmm. Interesting. 23:13:58 Not sure why the number is odd but it is a potentially reasonable interpretation. 23:15:28 sommerfeld: Which congestion control algorithm are you using? 23:16:22 (I discovered last year that our "cubic" is possibly rubbish) 23:18:09 i am in fact using cubic 23:18:46 (which seemed to help for long-haul connections which this test was not..) 23:19:36 sommerfeld: Does it improve if you switch back to sunreno 23:19:43 trying that now 23:20:08 The conditions where I was seeing this issue were also a speed imbalance: a 10G server into a generally 1G network etc 23:20:59 yah, with both sunreno and newreno set as congestion control on the sender I don't see the speed collapse 23:21:03 yeeeeah 23:21:04 sigh 23:21:28 this is the reverse situation (1G sender, 10G receiver) 23:24:38 I don't think I got around to filing a bug for this (with my apologies) but it definitely seems like a real issue 23:40:24 I still think there's something wrong going on in between driver and tcp on the receiver independent of tcp congestion control 23:43:11 (because the trigger for the aforementioned rx_bind_norcb events in i40e is too many buffers on loan from driver to mac) 23:46:26 ENOMAD: I added that comment. There are basically two different log pages that report that and my memory is in this case we had things that were from other devices. 23:46:41 ENOMAD: In your case with an LSI 8i you only have 8 actual PHYs on that HBA that can be directly connected anyways. 23:47:29 jbk: Eventually we will rewrite pci_boot.c, but the main reason no one has yet is because it has a huge amount of testing implications. 23:47:48 But fundamentally one wants this to mostly be able to look like hotplug after a fashion.; 23:47:57 And not have multiple divergent paths. 23:48:23 If I were going to work on that project, I'd go first finish the project to allow me to do arbitrary PCIe briges in propolis so I can fake up all the different corner cases of devices and resources. 23:49:14 i ask because getting segments to work, you pretty much are having to touch a _lot_ (and I mean a _lot_) of pci_boot.c 23:49:49 Doesn't really change what I'd do first. 23:50:02 Which is have a good way to test arbitrary topologies with a VM configuration that I can automated. 23:50:06 *automate 23:53:42 that's not a very useful answer tbh 23:54:43 I mean, if I was going to rewrite it I'd want to do that first. 23:54:48 It may make sense and be the right way. 23:55:00 But again, how do we test it is the big question that i'd have. 23:55:03 It's really high risk. 23:55:23 I've not looked at the ndi_busra stuff in detail, sorry. Keith did that on Oxide. 23:56:09 So, no, I guess I don't know of a reason, but if it was me, I'd first figure out how to test it all without needing every different hardware config under the sun. There are other ways too to look at it like figuring out how to write it so you can drive it outside of a specific booting config. 23:56:27 Dunno, happy to talk live or something if that'd be more useful for you. Not sure if I can give you the answer you want. 23:56:33 Or ask the question again and I'll try to do better. 23:58:40 I'd probably also see how Rich redid enumeration, which I know has been described and I've forgotten.