-
rzezeski
Hey all, so I'm having an issue with the zig test suite. They have this thing where they compile various code for different targets and try to run it. If the executable can't exec the test runner will handle that fine. But in this case it's building a static executable with no interpreter specified and that seems to cause us to immediately SIGKILL the process and I think perhaps we are the first platform that does this. So the
-
rzezeski
runner gets confused.
-
rzezeski
I could go code spelunking but if anyone could shortcut my knowledge here that would help. As in, the SIGKILL is expected?
-
rmustacc
Can you share the elf header for reference?
-
rzezeski
That is, the runner can handle exec(2) returning ENOEXEC just fine, but having a SIGKILL delivered is not something it expects
-
rzezeski
sure
-
rmustacc
And is it really static?
-
nbjoerg
if it is really static, no interpreter is fine
-
nbjoerg
are you sure it doesn't get killed for executing a random system call?
-
rmustacc
Aside from the fact that it probably won't work on us.
-
rzezeski
-
rzezeski
rmustacc: see the comment on that giest
-
rmustacc
And it doesn't have a dynamic section I take it?
-
rzezeski
nbjoerg: I should rephrase, the test runner is compiling this executable for a linux target and macos target, it then tries to run them, if it gets ENOEXEC then it just ignores it, but we are also delivering a SIGKILL so it reports an error.
-
jbk
-
rzezeski
rmustacc: What would I search for?
-
rzezeski
I see no Interpreter section
-
rmustacc
It would be elfdump -d, but if it's for another platform that's probably neither here nor there.
-
jbk
(and lines 726-741 in the same file)
-
rzezeski
yea eldfump -d shows nothing
-
rmustacc
We shouldn't be in that because there's no PT_INTERP..
-
rmustacc
The 726-741 bit.
-
rmustacc
rzezeski: Can you put an fbt on exec_args:return for this?
-
rmustacc
The SIGKILL is probably because we got far enough that the child is in an undefined state so it had to kill it and couldn't just drop an ENOEXEC at that point.
-
rmustacc
For example, if you were part way through replacing the process image or stack or whatever.
-
rmustacc
rzezeski: Actually, also add mapelfexec to that.
-
rzezeski
rmustacc: 0 for both
-
rzezeski
looks like I'll be writing another blog post here soon, haha
-
rmustacc
OK, I have other questions likely, but they'd require the elf object.
-
rmustacc
rzezeski: If I was to guess it was that we have a pgrogram header but no PT_INTERP.
-
rzezeski
rmustacc: okay, yea I DM'd you a path to the file
-
rzezeski
I figure I might have to do some spelunking on this one
-
rmustacc
Yeah, it was from looking at that I'm making that guess.
-
rzezeski
k
-
rzezeski
thank you for looking
-
rmustacc
But I don't know for certain.
-
rmustacc
-
rmustacc
I was already wrong with other gueses. But most of the rest of that is on having an interpeter.
-
rzezeski
what is the uphdr?
-
rzezeski
oh
-
rzezeski
is that the program header?
-
rmustacc
Yes, from mapelfexec.
-
rzezeski
cool, well this at least gives me something to look into, so that's great
-
rzezeski
Unfortunately my stomach is yelling at me so I need to make dinner now. Thanks for all the help everyone. I'll let you know what I find later.
-
rmustacc
So anyways, given how much is going on, I expect that the child process getting killed is unfortunately somewhat reasonable.
-
rmustacc
Because it's far enough along and we can't get you back to an ENOEXEC cleanly.
-
rmustacc
Doesn't mean we can't maybe figure out sooner or do something different. Just all I know so far.
-
rzezeski
We appear to be the first platform acting this way, according to the creator of Zig. FWIW.
-
rzezeski
But doesn't mean we are wrong.
-
rzezeski
So yea, first step is I need to make sure I understand why it's happening.
-
rmustacc
rzezeski: Are they expecting to get an ENOEXEC and for it not to be run or?
-
rzezeski
rmustacc: correct
-
rmustacc
Based on what property of the elf object?
-
rzezeski
I...dunno
-
rmustacc
I did confirm you're in this case with DTrace.
-
rmustacc
I don't know enough about the expected semantics. If I were to guess the phdr is supposed to have semantics for an interpreter.
-
pilonsi
andyf: I didn't know about the strings command, thanks!
-
pilonsi
I've noticed I have build errors because I'm running out of space in /tmp
-
pilonsi
How big should I make /tmp to be safe for building?
-
pilonsi
(And out of curiosity what's the specs of the machines you guys are using to build illumos?)
-
andyf
It really is only a superficial check for something like this because it just looks for sequences of printable characters over a certain length. However it clearly shows the original problem and a quick before and after proves things look better now.
-
andyf
A more robust approach would be to look at the string table, but I don't think I'd bother unless an advocate comes back and asks for it.
-
pilonsi
How could I look at the string table if it was needed? Inspecting the binary?
-
andyf
I usually use a VM with at least 16 GiB of RAM, and I limit the number of build threads.. although the illumos.sh is supposed to do that for you based on the amount of installed memory.
-
pilonsi
I'm using an OI VM with 8 vcpus and 8GiB RAM running on a bhyve zone in an omnios system which also hosts in which im also hosting a windows vm for other people. The host has 32GiB ram but the windows machine is using 16 and I'm running into hangs if i allocate more memory for my vm (i.e 12GB). I want to investigate that once im more comfortable with the os
-
andyf
pilonsi - something like this (this is omnios but OI should also have GNU readelf somewhere) /usr/gnu/bin/readelf -p .rodata.str1.8 /kernel/misc/amd64/kbtrans
-
andyf
(and someone may be a long later to tell me how to do it with native illumos tools :) )
-
andyf
*along
-
andyf
You can see the embedded tabs in the existing message:
-
andyf
[ a0] kbtrans_queueevent: Can't allocate ^I^I^I^I^Iblock for event.
-
pilonsi
Thanks! I'll look into that as well later as an exercise
-
pilonsi
Do you know why nvme disk device names are in the form of c3t00E04C1C0C2F6F08d0 instead of the simpler names sata disks have?
-
Woodstock
if an nvme device supports a unique id, that is used for the target number
-
Woodstock
similar concept as a SAS WWN, really
-
sjorge
WWN's are great
-
jbk
and i suspect the reason SATA drives don't is just because we never really exposed it for SATA drives
-
jbk
at least until relatively recently
-
sjorge
can we get wwn's for sata?
-
Woodstock
sure, but can we fix zfs rpool import code to handle device path changes in all possible cases? :)
-
jbk
sjorge: yes (if it has one)..
-
jbk
previously we weren't translating page 83
-
sjorge
so most sata ssd would have one then
-
jbk
yeah, probably
-
jbk
even a lot of HDDs will as well (I'm guessing most likely really old SATA HDDs would be the ones that don't)
-
sommerfeld
Woodstock: main advantage is that the WWN-based names stay the same if you accidentally swap cables around.
-
sommerfeld
jbk: all the SATA spinning rust I've plugged into SAS controllers would appear to have WWNs
-
jbk
yeah, i'm thinking like _really_ old drives
-
Woodstock
hm. pata -> sata bridges perhaps? :)
-
gitomat
[illumos-gate] 15910 Favor judgment over judgement -- Dan Cross <cross⊙oc>
-
gitomat
[illumos-gate] 11616 Support getaddrinfo() with socktype 0/AI_NUMERICSERV -- Andy Fiddaman <illumos⊙fn>
-
richlowe
sommerfeld: so I'm in the libc locale code, for reasons, and I remember you had done stuff there. Did that ever land?
-
richlowe
sommerfeld: and if not, do you remember thinking "Huh, this seems buggy" about anything around the xpg7 bits, especially?
-
papertigers
Is there an mdb command that can give me the value at some offset of a pointer? I know you can do the <addr>,40::dump to repeat and dump the data. But I am looking at a problem where the offset looks differnt in my corefile than what CTF seems to think the real offset is
-
richlowe
you can just do math on <addr>
-
richlowe
`<addr>+5/J`
-
richlowe
or whatever
-
papertigers
okay, I was just curious if there was anything built in that would do the math
-
papertigers
ahh that makes sense
-
papertigers
thanks!
-
richlowe
remember the default radix is hex
-
richlowe
0t<blah> for decimal
-
papertigers
yup, thanks
-
sommerfeld
richlowe: no, I need to get back to that.
-
papertigers
I think what I am seeing is a different versions of libssl being pulled into my binary (version 1.x and v 3.x). So mdb shows me valid data if I do <add>::print SSL session
-
papertigers
but the ::dis shows an offset different from ::offsetof SSL session
-
sommerfeld
multi-library messes. ugh.
-
jbk
pldd should show if that's happening
-
rmustacc
Well, multiple version sis going to be very sad.
-
jbk
if you're unsure about that bit
-
papertigers
jbk: pldd shows me:
-
papertigers
/lib/amd64/libssl.so.3
-
papertigers
/opt/local/lib/libssl.so.1.1
-
papertigers
which is what tipped me off
-
sommerfeld
richlowe: my messing with the locale stuff was excruciatingly localized to optimizing how you find the current locale
-
papertigers
so the pkg mediator version vs the pkgin version
-
jbk
do you know what version it's supposed to use?
-
sommerfeld
papertigers: what is the executable showing this? you probably need to link it differently or otherwise be more selective about what libraries get pulled in.
-
papertigers
sommerfeld: this was a result of me running `cargo install cargo-outdated`
-
papertigers
aka a rust binary
-
papertigers
and yeah, I just built the binary on a different illumos box that doesn't have pkgin setup and everything works as expected
-
richlowe
I believe the rust crate uses pkgconfig
-
richlowe
which probably means it finds the pkgsrc libssl, where something else finds the system one.
-
richlowe
ldd -v can help you there
-
papertigers
richlowe: yup that makes it pretty clear :) -- the rust binary cargo-outdated requires version OPENSSL_3.0.0 while libcurl that's getting pulled in is depending on OPENSSL_1_1_1
-
papertigers
hmm maybe I will turn this into a fun short blog post
-
sommerfeld
yeah, build a newer libcurl against openssl 3
-
papertigers
the real solution is probably to clean up my dev zone so that all of these deps come from pkg and not a mix of pkg/pkgin
-
jbk
or link against a pkgsrc libcurl