01:11:44 Hey all, so I'm having an issue with the zig test suite. They have this thing where they compile various code for different targets and try to run it. If the executable can't exec the test runner will handle that fine. But in this case it's building a static executable with no interpreter specified and that seems to cause us to immediately SIGKILL the process and I think perhaps we are the first platform that does this. So the 01:11:45 runner gets confused. 01:12:14 I could go code spelunking but if anyone could shortcut my knowledge here that would help. As in, the SIGKILL is expected? 01:12:52 Can you share the elf header for reference? 01:13:12 That is, the runner can handle exec(2) returning ENOEXEC just fine, but having a SIGKILL delivered is not something it expects 01:13:14 sure 01:13:21 And is it really static? 01:13:29 if it is really static, no interpreter is fine 01:13:41 are you sure it doesn't get killed for executing a random system call? 01:13:43 Aside from the fact that it probably won't work on us. 01:14:06 https://gist.github.com/rzezeski/6c4169e3d7085958cf2404e6e5ddb958 01:14:14 rmustacc: see the comment on that giest 01:14:44 And it doesn't have a dynamic section I take it? 01:15:18 nbjoerg: I should rephrase, the test runner is compiling this executable for a linux target and macos target, it then tries to run them, if it gets ENOEXEC then it just ignores it, but we are also delivering a SIGKILL so it reports an error. 01:15:37 https://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/exec/elf/elf.c?r=6233c8a8#1046 maybe? 01:15:40 rmustacc: What would I search for? 01:15:48 I see no Interpreter section 01:16:22 It would be elfdump -d, but if it's for another platform that's probably neither here nor there. 01:16:23 (and lines 726-741 in the same file) 01:16:40 yea eldfump -d shows nothing 01:17:05 We shouldn't be in that because there's no PT_INTERP.. 01:17:09 The 726-741 bit. 01:17:57 rzezeski: Can you put an fbt on exec_args:return for this? 01:18:36 The SIGKILL is probably because we got far enough that the child is in an undefined state so it had to kill it and couldn't just drop an ENOEXEC at that point. 01:18:48 For example, if you were part way through replacing the process image or stack or whatever. 01:20:10 rzezeski: Actually, also add mapelfexec to that. 01:21:47 rmustacc: 0 for both 01:26:18 looks like I'll be writing another blog post here soon, haha 01:28:02 OK, I have other questions likely, but they'd require the elf object. 01:34:41 rzezeski: If I was to guess it was that we have a pgrogram header but no PT_INTERP. 01:35:29 rmustacc: okay, yea I DM'd you a path to the file 01:35:36 I figure I might have to do some spelunking on this one 01:35:42 Yeah, it was from looking at that I'm making that guess. 01:35:46 k 01:35:49 thank you for looking 01:36:02 But I don't know for certain. 01:36:23 Specifically https://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/exec/elf/elf.c?r=6233c8a8#634. 01:37:09 I was already wrong with other gueses. But most of the rest of that is on having an interpeter. 01:37:25 what is the uphdr? 01:37:33 oh 01:37:41 is that the program header? 01:37:48 Yes, from mapelfexec. 01:38:01 cool, well this at least gives me something to look into, so that's great 01:38:45 Unfortunately my stomach is yelling at me so I need to make dinner now. Thanks for all the help everyone. I'll let you know what I find later. 01:38:58 So anyways, given how much is going on, I expect that the child process getting killed is unfortunately somewhat reasonable. 01:39:15 Because it's far enough along and we can't get you back to an ENOEXEC cleanly. 01:39:56 Doesn't mean we can't maybe figure out sooner or do something different. Just all I know so far. 01:40:48 We appear to be the first platform acting this way, according to the creator of Zig. FWIW. 01:40:54 But doesn't mean we are wrong. 01:41:05 So yea, first step is I need to make sure I understand why it's happening. 01:46:38 rzezeski: Are they expecting to get an ENOEXEC and for it not to be run or? 01:46:49 rmustacc: correct 01:46:58 Based on what property of the elf object? 01:47:05 I...dunno 01:56:38 I did confirm you're in this case with DTrace. 01:57:48 I don't know enough about the expected semantics. If I were to guess the phdr is supposed to have semantics for an interpreter. 10:42:51 andyf: I didn't know about the strings command, thanks! 10:43:40 I've noticed I have build errors because I'm running out of space in /tmp 10:43:54 How big should I make /tmp to be safe for building? 10:44:16 (And out of curiosity what's the specs of the machines you guys are using to build illumos?) 10:44:25 It really is only a superficial check for something like this because it just looks for sequences of printable characters over a certain length. However it clearly shows the original problem and a quick before and after proves things look better now. 10:44:56 A more robust approach would be to look at the string table, but I don't think I'd bother unless an advocate comes back and asks for it. 10:45:25 How could I look at the string table if it was needed? Inspecting the binary? 10:45:43 I usually use a VM with at least 16 GiB of RAM, and I limit the number of build threads.. although the illumos.sh is supposed to do that for you based on the amount of installed memory. 10:52:44 I'm using an OI VM with 8 vcpus and 8GiB RAM running on a bhyve zone in an omnios system which also hosts in which im also hosting a windows vm for other people. The host has 32GiB ram but the windows machine is using 16 and I'm running into hangs if i allocate more memory for my vm (i.e 12GB). I want to investigate that once im more comfortable with the os 10:54:14 pilonsi - something like this (this is omnios but OI should also have GNU readelf somewhere) /usr/gnu/bin/readelf -p .rodata.str1.8 /kernel/misc/amd64/kbtrans 10:55:44 (and someone may be a long later to tell me how to do it with native illumos tools :) ) 10:55:49 *along 10:56:15 You can see the embedded tabs in the existing message: 10:56:16 [ a0] kbtrans_queueevent: Can't allocate ^I^I^I^I^Iblock for event. 11:17:41 Thanks! I'll look into that as well later as an exercise 14:21:40 Do you know why nvme disk device names are in the form of c3t00E04C1C0C2F6F08d0 instead of the simpler names sata disks have? 14:23:02 if an nvme device supports a unique id, that is used for the target number 14:23:55 similar concept as a SAS WWN, really 14:33:11 WWN's are great 14:38:24 and i suspect the reason SATA drives don't is just because we never really exposed it for SATA drives 14:38:51 at least until relatively recently 15:25:41 can we get wwn's for sata? 15:26:52 sure, but can we fix zfs rpool import code to handle device path changes in all possible cases? :) 15:30:21 sjorge: yes (if it has one).. 15:30:44 previously we weren't translating page 83 15:32:13 so most sata ssd would have one then 15:36:10 yeah, probably 15:36:34 even a lot of HDDs will as well (I'm guessing most likely really old SATA HDDs would be the ones that don't) 15:42:17 Woodstock: main advantage is that the WWN-based names stay the same if you accidentally swap cables around. 15:43:38 jbk: all the SATA spinning rust I've plugged into SAS controllers would appear to have WWNs 15:49:11 yeah, i'm thinking like _really_ old drives 16:02:56 hm. pata -> sata bridges perhaps? :) 17:57:58 [illumos-gate] 15910 Favor judgment over judgement -- Dan Cross 18:10:22 [illumos-gate] 11616 Support getaddrinfo() with socktype 0/AI_NUMERICSERV -- Andy Fiddaman 20:55:31 sommerfeld: so I'm in the libc locale code, for reasons, and I remember you had done stuff there. Did that ever land? 20:56:21 sommerfeld: and if not, do you remember thinking "Huh, this seems buggy" about anything around the xpg7 bits, especially? 20:58:02 Is there an mdb command that can give me the value at some offset of a pointer? I know you can do the ,40::dump to repeat and dump the data. But I am looking at a problem where the offset looks differnt in my corefile than what CTF seems to think the real offset is 20:58:35 you can just do math on 20:58:49 `+5/J` 20:58:50 or whatever 20:58:53 okay, I was just curious if there was anything built in that would do the math 20:59:02 ahh that makes sense 20:59:03 thanks! 20:59:33 remember the default radix is hex 20:59:40 0t for decimal 20:59:48 yup, thanks 21:04:35 richlowe: no, I need to get back to that. 21:04:55 I think what I am seeing is a different versions of libssl being pulled into my binary (version 1.x and v 3.x). So mdb shows me valid data if I do ::print SSL session 21:05:18 but the ::dis shows an offset different from ::offsetof SSL session 21:05:18 multi-library messes. ugh. 21:05:25 pldd should show if that's happening 21:05:25 Well, multiple version sis going to be very sad. 21:05:37 if you're unsure about that bit 21:06:02 jbk: pldd shows me: 21:06:04 /lib/amd64/libssl.so.3 21:06:06 /opt/local/lib/libssl.so.1.1 21:06:13 which is what tipped me off 21:06:38 richlowe: my messing with the locale stuff was excruciatingly localized to optimizing how you find the current locale 21:06:48 so the pkg mediator version vs the pkgin version 21:07:06 do you know what version it's supposed to use? 21:07:53 papertigers: what is the executable showing this? you probably need to link it differently or otherwise be more selective about what libraries get pulled in. 21:08:33 sommerfeld: this was a result of me running `cargo install cargo-outdated` 21:08:44 aka a rust binary 21:09:15 and yeah, I just built the binary on a different illumos box that doesn't have pkgin setup and everything works as expected 21:09:18 I believe the rust crate uses pkgconfig 21:09:50 which probably means it finds the pkgsrc libssl, where something else finds the system one. 21:09:55 ldd -v can help you there 21:17:35 richlowe: yup that makes it pretty clear :) -- the rust binary cargo-outdated requires version OPENSSL_3.0.0 while libcurl that's getting pulled in is depending on OPENSSL_1_1_1 21:20:36 hmm maybe I will turn this into a fun short blog post 21:24:54 yeah, build a newer libcurl against openssl 3 21:27:02 the real solution is probably to clean up my dev zone so that all of these deps come from pkg and not a mix of pkg/pkgin 23:32:04 or link against a pkgsrc libcurl