-
jclulow
spicywolf: If the rustup bits don't work on OI, that's definitely an OI bug that should get sorted out
-
jclulow
They're intentionally build to only use things that every illumos system should have in /usr/lib/amd64
-
jclulow
*built
-
spicywolf
it was a while ago. last time I used OI was ~2023 or so.
-
spicywolf
I can spin up a VM and check it sometime today if need be.
-
jbk
the other issues just look to largely be a matter of just adding some of the missing support to various crates...
-
Smithx10
Anyone run into this while running illumos in libvirt / kvm-amd?
gist.github.com/Smithx10/c37d0560c59954457abdcccb43a8a50a
-
Smithx10
Getting a nice panic
-
richlowe
huh
-
richlowe
Smithx10: you're going to want prom_debug=1, kbm_debug=1 etc as a first step probably, or to boot with kmdb and let it catch the panic
-
richlowe
or to do `startup_modules+10a=I` and get an idea what called that bcopy
-
jbk
or startup_modules::dis and look at the instructions at/right before +0x10a
-
jbk
(on my system, that's smbios_open() which does call bcopy a few times)
-
Smithx10
-
tsoome_
you need to use: boot -k -B prom_debug=1 kbm_debug=1
-
tsoome_
with comma between prom_debug and kbm_debug
-
tsoome_
(copy-paste fun)
-
Smithx10
tsoome_: boot -k -B prom_debug=1,kbm_debug=1 ?
-
tsoome_
yes
-
Smithx10
-
tsoome_
so your bcopy was operating with rdi: fffffe0bd2770000 rsi: fffffe0b8aa06370 and while accessing address in rsi, it got fault. you can try to see ::stack -t and startup_modules::dis (last one for proximity of 10a).
-
richlowe
startup_kernel: bi->bi_smbios is 0x0
-
richlowe
so we're going to do the walk to find the smbios
-
richlowe
unfortunately, the logic after that is too hairy for me to follow in text without a machine
-
richlowe
Smithx10: I would break in smbios_open, and step until we isolate the bad copy (unless you're familiar with mdb, in which case you could do it faster?)
-
richlowe
if you're _very_ unfamiliar with mdb you want to boot with `-kd` to enter the debugger as soon as possible, and then `smbios_open:b` to set the breakpoint, and `:c` to continue. When you hit the breakpoint "::step" will step, "::step over" will step, but not into calls, and ":c" will continue. You could probably `::step over` until it crashes
-
richlowe
jbk: if you're actually here and thinking x86-y thoughts, maybe you're better at this?
-
Smithx10
richlowe: i am very unfamiliar with mbd, only used it to debug userspace applications typically
-
richlowe
once you get in there, it's very similar :)
-
Smithx10
-
richlowe
uh
-
richlowe
hm
-
jbk
i can't think of anything better offhand.. but looking at smbios_open() we only appear to start doing bcopy()s once we think we've found the table
-
richlowe
jbk: yeah, I'm just not sure I can work out which it is manually
-
jbk
it'd be nice if bcopy setup frame pointers so we could easily see which bcopy is the culprit
-
richlowe
but my x86 is rusty
-
tsoome_
bi_smbios is 0 with BIOS system as dboot does populate bi_smbios from efi system table. and yes, either we have smb2 or smb3.
-
jbk
i think i might have once or twice dumped the stack and figured out a similar issue, but that was years ago :)
-
richlowe
unfortunately, skipping the framepointer there is probably worthwhile (or was)
-
richlowe
on arm, I have been adding them whenever I needed one
-
richlowe
they can always go away if they matter
-
jbk
tsoome_: so perhaps it might be worth trying EFI boot instead of bios?
-
Smithx10
yeah, thats what Im thinking
-
tsoome_
quite likely.
-
richlowe
well, it might boot
-
richlowe
but that's not really a solution
-
jbk
richlowe: i still have a branch i haven't touched in forever that does that for userland, but i never felt like i was able to get good measurements on the impact
-
richlowe
certainly something worth adding to a bug report.
-
richlowe
but it's probably best to actually get to the bottom of it
-
richlowe
jbk: if I were doing it on x86 I'd see if I could condition it on DEBUG
-
richlowe
or on _something_, I'm not really keen on DEBUG making that big a difference
-
richlowe
it's like how some people want SOURCEDEBUG=yes to imply -O0, and it's like... at what point are you changing the thing you're debugging so far that you're debugging _something else_ now
-
jbk
the one data point i have is that at least building all of pkgsrc didn't seem to have any noticable impact on build times (jperkin was kind enough to test it back after i did it)
-
jbk
or i guess had.. it's been long enough not sure how valid it'd be anymore
-
tsoome_
smb2 or smb3 can be figured out from disasm and register values, but because we do get to bcopy, one was set, and getting fault means the table(s) are not having values we assume they should have...
-
Smithx10
Yeah its booting
-
tsoome_
you can try smbios command to check out the tables
-
richlowe
tsoome_: how do the tables get into physmem in the first place? Would they land in the same spot in both a bios and efi boot?
-
richlowe
it seems unlikely, but it' be convenient
-
tsoome_
im not quite sure the same spot is granted.
-
jbk
i'm guessing it's probably a property of whatever ovmf image they're using...
-
tsoome_
also I think there have been some funny bugs around like finding 64-bit pointer where 32-bit pointer was expected (or vice-versa)
-
tsoome_
I'd say, if there is CSM, you probably want to stick with UEFI (because the BIOS emulation often is buggy)
-
tsoome_
but, on this crash -- if the fault was for pointer in rsi, its second argument to bcopy, or pointer to destination and that one should be allocated with known size and the same size should be used by bcopy. Therefore it is making me to wonder ....
-
tsoome_
(unless I misread something:)
-
richlowe
integer problems?
-
richlowe
doesn't look easily possible
-
tsoome_
no idea, perhaps. maybe ::stack -t would tell more, but likely some ::bp and stepping with mdb would reveal.. or if there is enough resources, build with debug printouts:)
-
richlowe
I can't follow the 2 v. 3 behaviour in the copying
-
richlowe
look at line ~155
-
richlowe
a look at the size param would be good, too
-
richlowe
rdx is _way_ too big
-
richlowe
I don't see us being careful about garbage data
-
richlowe
but I think we're careful _enough_
-
tsoome_
I guess, its should be smbe_stlen and therefore smb3 as smbe_stlen in 2.1 is 16 bit int.
-
richlowe
and it's capped to SMB_ENTRY_MAXLEN, or is that inately (or should be)
-
richlowe
it'd be good to get the broken system broken again and in the debugger
-
tsoome_
yea. so it means some ::bp and ::next/::step::cont is in order:)