07:35:59 spicywolf: If the rustup bits don't work on OI, that's definitely an OI bug that should get sorted out 07:36:43 They're intentionally build to only use things that every illumos system should have in /usr/lib/amd64 07:36:47 *built 11:40:10 it was a while ago. last time I used OI was ~2023 or so. 11:41:15 I can spin up a VM and check it sometime today if need be. 14:41:48 the other issues just look to largely be a matter of just adding some of the missing support to various crates... 19:40:17 Anyone run into this while running illumos in libvirt / kvm-amd? https://gist.github.com/Smithx10/c37d0560c59954457abdcccb43a8a50a 19:40:37 Getting a nice panic 19:43:20 huh 19:44:38 Smithx10: you're going to want prom_debug=1, kbm_debug=1 etc as a first step probably, or to boot with kmdb and let it catch the panic 19:45:21 or to do `startup_modules+10a=I` and get an idea what called that bcopy 19:48:08 or startup_modules::dis and look at the instructions at/right before +0x10a 19:48:49 (on my system, that's smbios_open() which does call bcopy a few times) 19:51:45 Alright im here: https://gist.github.com/Smithx10/c37d0560c59954457abdcccb43a8a50a?permalink_comment_id=5734604#gistcomment-5734604 at the debugger 19:55:45 you need to use: boot -k -B prom_debug=1 kbm_debug=1 19:56:20 with comma between prom_debug and kbm_debug 19:56:35 (copy-paste fun) 19:59:21 tsoome_: boot -k -B prom_debug=1,kbm_debug=1 ? 19:59:29 yes 20:01:07 here: https://gist.github.com/Smithx10/8694070f9c7615cf273e25d1643095c1 20:22:21 so your bcopy was operating with rdi: fffffe0bd2770000 rsi: fffffe0b8aa06370 and while accessing address in rsi, it got fault. you can try to see ::stack -t and startup_modules::dis (last one for proximity of 10a). 20:32:17 startup_kernel: bi->bi_smbios is 0x0 20:33:07 so we're going to do the walk to find the smbios 20:33:51 unfortunately, the logic after that is too hairy for me to follow in text without a machine 20:34:50 Smithx10: I would break in smbios_open, and step until we isolate the bad copy (unless you're familiar with mdb, in which case you could do it faster?) 20:35:39 if you're _very_ unfamiliar with mdb you want to boot with `-kd` to enter the debugger as soon as possible, and then `smbios_open:b` to set the breakpoint, and `:c` to continue. When you hit the breakpoint "::step" will step, "::step over" will step, but not into calls, and ":c" will continue. You could probably `::step over` until it crashes 20:36:10 jbk: if you're actually here and thinking x86-y thoughts, maybe you're better at this? 20:37:47 richlowe: i am very unfamiliar with mbd, only used it to debug userspace applications typically 20:38:13 once you get in there, it's very similar :) 20:38:41 https://gist.github.com/Smithx10/96ef0efbc0f96eb875cdb62ba9082170 20:39:15 uh 20:39:44 hm 20:39:46 i can't think of anything better offhand.. but looking at smbios_open() we only appear to start doing bcopy()s once we think we've found the table 20:40:18 jbk: yeah, I'm just not sure I can work out which it is manually 20:40:29 it'd be nice if bcopy setup frame pointers so we could easily see which bcopy is the culprit 20:40:30 but my x86 is rusty 20:40:52 bi_smbios is 0 with BIOS system as dboot does populate bi_smbios from efi system table. and yes, either we have smb2 or smb3. 20:40:57 i think i might have once or twice dumped the stack and figured out a similar issue, but that was years ago :) 20:41:10 unfortunately, skipping the framepointer there is probably worthwhile (or was) 20:41:28 on arm, I have been adding them whenever I needed one 20:41:38 they can always go away if they matter 20:41:46 tsoome_: so perhaps it might be worth trying EFI boot instead of bios? 20:41:58 yeah, thats what Im thinking 20:42:03 quite likely. 20:42:07 well, it might boot 20:42:10 but that's not really a solution 20:42:26 richlowe: i still have a branch i haven't touched in forever that does that for userland, but i never felt like i was able to get good measurements on the impact 20:42:28 certainly something worth adding to a bug report. 20:42:45 but it's probably best to actually get to the bottom of it 20:43:10 jbk: if I were doing it on x86 I'd see if I could condition it on DEBUG 20:43:22 or on _something_, I'm not really keen on DEBUG making that big a difference 20:44:03 it's like how some people want SOURCEDEBUG=yes to imply -O0, and it's like... at what point are you changing the thing you're debugging so far that you're debugging _something else_ now 20:44:17 the one data point i have is that at least building all of pkgsrc didn't seem to have any noticable impact on build times (jperkin was kind enough to test it back after i did it) 20:45:31 or i guess had.. it's been long enough not sure how valid it'd be anymore 20:45:31 smb2 or smb3 can be figured out from disasm and register values, but because we do get to bcopy, one was set, and getting fault means the table(s) are not having values we assume they should have... 20:47:12 Yeah its booting 20:48:23 you can try smbios command to check out the tables 20:50:44 tsoome_: how do the tables get into physmem in the first place? Would they land in the same spot in both a bios and efi boot? 20:51:21 it seems unlikely, but it' be convenient 20:51:26 im not quite sure the same spot is granted. 20:52:57 i'm guessing it's probably a property of whatever ovmf image they're using... 20:52:57 also I think there have been some funny bugs around like finding 64-bit pointer where 32-bit pointer was expected (or vice-versa) 20:54:05 I'd say, if there is CSM, you probably want to stick with UEFI (because the BIOS emulation often is buggy) 20:57:46 but, on this crash -- if the fault was for pointer in rsi, its second argument to bcopy, or pointer to destination and that one should be allocated with known size and the same size should be used by bcopy. Therefore it is making me to wonder .... 20:58:00 (unless I misread something:) 21:01:26 integer problems? 21:02:21 doesn't look easily possible 21:03:57 no idea, perhaps. maybe ::stack -t would tell more, but likely some ::bp and stepping with mdb would reveal.. or if there is enough resources, build with debug printouts:) 21:10:15 I can't follow the 2 v. 3 behaviour in the copying 21:10:42 look at line ~155 21:15:06 a look at the size param would be good, too 21:16:19 rdx is _way_ too big 21:17:05 I don't see us being careful about garbage data 21:17:16 but I think we're careful _enough_ 21:23:46 I guess, its should be smbe_stlen and therefore smb3 as smbe_stlen in 2.1 is 16 bit int. 21:35:08 and it's capped to SMB_ENTRY_MAXLEN, or is that inately (or should be) 21:35:28 it'd be good to get the broken system broken again and in the debugger 21:35:51 yea. so it means some ::bp and ::next/::step::cont is in order:)