00:43:59 https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/i86pc/io/mp_platform_common.c#L838-L840 <-- would anyone get too upset with a change that only prints that if Lint != 1 (the value we expect) ? 00:54:56 that seems like an entirely reasonable change.. 01:24:32 i'll at least file a bug on it.. 01:25:59 and I have found a legitimate bug in the iommu code, though it's one that doesn't happen unless there's more than one segment 01:26:02 https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/i86pc/io/immu_dmar.c#L709-L722 01:26:16 if IMMU_MAXSEG is > 1, it'll try to create duplicate instances 01:26:25 as I discovered :) 01:47:18 unit=0 should move outside the for (i=0 ; ..) loop, presumably? 01:47:28 yeah that's what I'm trying 16:52:15 are there any resource for decoding PCIe errors -- i'm thinking 'ereport.io.pciex.rc.ce-msg' is implying a correctable error event 16:52:28 but it'd be nice to be able to decode the particulars 16:52:39 (in this case it seems to be originating from an NVMe disk) 18:28:23 hello all. I've got a couple questions regarding name resolution in illumos 18:29:25 1. I work on a software that has a test that tries to resolve "host.invalid" as an Internet hostname 18:30:20 the test runs fine on other OSes (the hostname does not resolve, which is the intended scenario, because the test verifies that the code handles the failure to resolve correctly) 18:31:53 on illumos the test times out because the timeout is rather low (around 0.1s) and the attempt to resolve the hostname on illumos does indeed result in NXDOMAIN, but it sometimes takes something around 0.5-1.0s 18:34:12 which to me seems like the illumos host does not treat the hostname as special (see RFC 6761 Section Section 6.4 point 3) and queries the caching resolver, which also does not recognise the name as special (point 4 ibid.), which then ends up as a regular query 18:35:37 which takes the time to resolve for the first time, then serves NXDOMAIN fast until the negative response expires, at which point it incurs another hiccup, and so on 18:36:17 (which is point 5 ibid.) 18:38:21 with this in mind, would it make sense to implement point 3 in illumos name resolver library? everything that ends in ".invalid." just before it goes to a resolver would instead produce an immediate "host not found" 18:46:23 2. another test in the same software calls ether_hostton() two times: one for a lower-case hostname and another for the same hostname but upper-case 18:48:45 both variants are in /etc/ethers, and on most other OSes both calls to ether_hostton() succeed 18:50:25 however, on illumos the upper-case name randomly fails to resolve: let's say if you run the test repeatedly, it will consistently fail first few minutes, then supposedly some change happens somewhere and the name starts to resolve fine 18:50:49 then if you wait for a few hours and repeat the same, it will again fail to resolve, and maybe even not recover later 18:51:44 this behaviour is the same as what I observed on Solaris 11.4 CBE before the most recent one (didn't test the most recent one in this regard) 18:52:06 seems to be a weird caching effect somewhere, and does not seem to be the intended behaviour 18:53:51 I think your dns setup has some flaws. default timeout in libresolv is 5 seconds. if you are using libresolv, the only caches involved is your dns server; otherwise disable nscd for test. 18:54:28 for now I have exempted illumos from the tests in issue 2, but issue 1 affects more practical scenarios, so it would be nice to be able to test it as originally intended 18:56:03 well, you can test dns with dig (and dig @8.8.8.8 for example) 18:56:28 tsoome: it does not time out, it seems to resort ultimately to the root servers, which do return an NXDOMAIN, but it takes time and there is no point in sending that query in the first place, as the RFC explains 18:57:40 I agree that the caching nameserver ought to short-circuit the query, but this does not seem to be done either, and is a separate issue 19:06:49 well, if you want reliable test case, you can not rely on global name server but need to have your own local caching server. In my home setup, 8.8.8.8 does return answer with Query time: 6 msec and my local caching server Query time: 0 msec 19:07:55 thats according to dig. 19:08:20 thank you for discussing this, but the point is that it would be nice not to query a caching resolver in the first place 19:10:00 because "host.invalid" by specification must produce an NXDOMAIN, and the recommended logical shortcut is to get the same result at the host that is trying to resolve the hostname 19:11:15 in practical terms, test 1 runs fine (fails to resolve close enough to instantly) on a number of OSes less sporadic timeouts on illumos, for example: https://ci.tcpdump.org/#/builders/57/builds/1759 19:12:57 given a sufficient number of retries (or favourable random conditions), it passes, so it is not an acute problem 19:14:17 but to me it clearly looks like a place for a simple and useful improvement, so maybe it would be best just to make a feature request, though I decided to ask first 19:18:20 what variable in the kernel actually gets set with boot -v ? 19:19:33 boothowto 19:19:39 & RB_VERBOSE 19:24:24 oh it's not verbose boot.. it's the debug build... 19:24:59 just to see (though maybe we should go ahead and put it up) I updated acpica to the latest release 20250807 19:25:15 but my god does it spam the kernel log 19:25:20 so we have a few branches of acpica updates in somewhat approaching flight 19:25:30 the challenge is testing the damn thing 19:25:48 oh, and that isn't to say that your changes are superfluous, just that you're probably going to run into the problem everyone else did :) 19:26:03 if you have a variety of hardware to test _and_ those changes, though, I think a bunch of people are at least interested 19:39:49 LOTS of changes in the category of, "LARGE TEST SPACE due to amount of HW." (side-eyes at any "Intel Common Code" updates for any Intel NIC drivers) 19:40:52 "Pay no attention to that `ixgbe-e610` branch in github.com/danmcd/illumos-gate ... NONE I tell you. I haven't even BOOTED it yet, but it does compile..." 19:42:14 (E610 is an ixgbe extension, thankfully. How much of a PITA it is remains to be seen. I wish Intel had released E610 instead of E700 10Gbit a long time ago.) 19:47:10 at the same time.. you either have to figure out where the line is and accept the risk, or decide not to do it... 19:47:35 i suspect most of the issues are likely to be with very old machines (just because otherwise they'd probably break windows or at least linux) 19:47:47 since AFAIK they're using the same common code we'd be using 19:48:44 if we know we need to test on x,y,z we can work on trying to get that happen 19:49:43 if it's just some amorphous list of systems, it's not going to happen 19:53:01 Well, it's more complicated than that (works on windows/linux, must be fine) in some cases. E.g., i40e common code updates may also require a newer NVM image. If your hardware is not flashed with that image you now have effectively dead parts (ask me how I know). If you want to update that image you need to boot into Linux/FreeBSD and use the tools provided by intel to do so. 19:53:46 I thought you could update it as an EFI binary. 19:53:58 danmcd: we have no way to update it AFAIK 19:54:01 s/as an/with an/^ 19:54:12 Also I thought your mobo-vendor could help there too. 19:54:13 and pretty sure Robert confirmed as much to me recently (or maybe in this very room) 19:54:33 Oh oh oh... I'm thinking onboard i40e, not PCIe card i40e. 19:54:43 yea I'm talking standalone 19:55:04 I wouldn't buy an i40e card, but many folks are stuck with 'em on their mobos. 19:55:43 My advice to people who want > 10Gbit is "Chelsio or Mellanox, and just jump straight to 100Gbit". 19:55:46 I have "dead" i40e cards in my office because of our last update to the common code 19:55:54 OUCH OUCH OUCH. 19:56:28 rzezeski, dead as in fried? How'd that happen? 19:57:18 nomad: no, dead as in our driver won't attach to it because the last common code update gated itself on a newer NVM image which my parts from circa 2017/2018 do not have 19:57:32 ah 19:57:57 well... if you're looking for someone who might be able to make use of them, I've got old stuff in $HOMELAB that might be happy :) 19:58:34 my point is that while updating common code that is already running in the wild on other operating systems is generally acceptable (to me at least), it might still cause heartburn in other ways, especially since we don't have the first class support for third-party tools like the other operating systems have 19:58:50 * nomad nods 19:59:00 do we have information in the PCI space to attach different drivers to different parts in that case? 19:59:08 not that that is a particularly tidy solution 20:00:08 * nomad is ~1 hr from vacation. 20:00:20 What is this concentration people are so interested in? 20:01:07 richlowe: took me a second to realize what you are getting at, I don't remember how the NVM version number is determined (and I don't feel like looking that up right now), but even if we could technically do such a trick it would mean carrying two different source code versions (or I guess carrying feature flags throughout the code). I feel like it could get really messy. 20:06:51 yeah, no, that's what I meant by not tidy 20:07:11 but I know for eg joyent went that route once with mr/dr_sas just for perceived safety, for eg 20:08:46 oh interesting, yea it's a neat idea that I've never thought of 20:10:08 We introduced some mediators to allow people to switch between a couple of drivers, maybe that was lmrc and cpqary.. something like that. 20:10:18 I guess it really depends on how much it diverges and how often the churn is. In this case it maybe wouldn't be terrible since the common code updates aren't too frequent. 20:11:46 But I guess we didn't really break anyone but myself given no one ever showed up to complain 20:31:55 i mean, if we wanted to be snarky.. we could probably use ebpf on linux to figure out exactly that intel's tool is sending to the driver and recreate that :P 20:44:43 hah 20:44:50 there's an option immu-dmar-print 20:44:58 except it's not _quite_ hooked up 20:45:05 it sets immu_dmar_print 20:45:18 which is otherwise never referenced, and instead there's another 'dmar_print' that's actually used 20:45:32 (also.. 'dmar' sound like something from star trek) 21:49:10 rzezeski: I tried to update your test-runner bug, but I think the AI loons ate it: I get definite warning messages when I run the tests saying the missing ones have "failed verification" 21:49:27 [illumos-gate] 17534 fct: buffer freed to wrong cache -- Vitaliy Gusev 23:32:57 hrm.. 23:34:03 does a pci host bridge have a pci bus/device/function or is that strictly for the 'stuff' behind the host bridge? 23:54:15 in Linux lspci my PCI bridges show up with various BDF values 23:54:57 the function is sometimes 0 and sometimes other values 23:55:51 the "host bridge", however, is 00:00.0, that must be the domain root or whatever is the proper term for it