00:02:58 got a completely different kernel panic this time :/ I really hope the hardware isn't causing problems 00:03:30 two ThinkPads in a row with hardware issues isn't something I want to deal with 00:04:19 this time linux_rcu_cleaner_func is causing a page fault 00:04:43 hmm, y'know, maybe I should run a RAM test 01:58:08 so the extended memory test in the Lenovo diagnostics utility reported no issues with this RAM. I guess I either have to run memtest86 for a super thorough test, or I try running Linux on this, see if it randomly page faults in the same way, which would be a strong indicator of hardware issues 02:06:46 so two of these page faults are in linux_rcu_cleaner_func. I dunno how likely it'd be for the exact same spot in code to crash if hardware issues are to blame. unless the memory layout is identical between these reboots 03:06:46 so I upgraded to the latest 14-STABLE on pkgbase, since that also gives me the debug stuff that I was apparently lacking before. it doesn't seem like I have any option to boot into a debug kernel though? I thought I saw it install one 03:07:41 dunno if this is actually going to improve debugging this since the issue was things being optimized out 03:36:55 hi, Can an amd64 zfs host be used to tftp & nfs a diskless arm64.aarch64 system? 03:39:34 tftp and nfs are both architecture independent protocols. To them everything is just binary files on disk. So, yes they can. 03:45:54 f451: yeah, I don't see a reason not to. We used to do that in the 90s. Had one server that hosted the images for SPARC, POWER, PA-RISC, Alpha, etc and booted the various systems off that system. Should be able to do it even today. 04:51:33 closing in on 2hr uptime on 14-STABLE through pkgbase's base_latest repo, no page faults yet but if it's gonna happen I'm guessing it'll be soon, will be interesting to see if it occurs in linux_rcu_cleaner_func again 05:04:09 mns: thanks. i was worried about endian-ness. i could x-compile the arm64 from the amd64 server i guess 05:05:27 these are both LE platforms anyways on FreeBSD 05:05:51 LE? 05:05:54 little endian 05:06:16 ah, sorry 05:06:43 our only big endian platforms are powerpc these days, and there's a powerpc64el variant that some use 05:07:15 even the sparc is little endian ? 05:08:26 sparc is gone 05:08:30 mips is gone 05:08:54 well, mips is still in stable/13, but sparc isn't there 05:08:58 even mips64 : 05:11:23 I meant sparc64, but yeah all sparc are gone, I forgot about that 05:11:46 i wonder how quick the arm64 would be if the nfs host was ktls 05:13:16 no aes or similar hardware 05:24:23 no idea, I don't have any arm64 systems, just a small amd64 system 05:25:23 I got to figure out when to upgrade to 13.3-RELEASE 05:51:36 happened again, fault in the exact same place in linux_rcu_cleaner_func 05:51:56 trying drm-510-kmod now since I think that's the only thing I have running currently using linuxkpi 05:53:42 any lkpi wifi there? 05:54:13 iwlwifi, rtw88 05:54:24 I'm using iwm 05:54:46 iwlwifi supports this chipset (Intel AC 9260) but I haven't felt a pressing need to try out iwlwifi yet 05:55:22 iwm is a bit sluggish to connect to APs sometimes but it gets there eventually 05:56:12 hmm, FreeBSD does load if_iwlwifi.ko though 05:57:35 I wonder if it could be causing issues even though it's not actually doing much. I guess I'll blacklist it if I still get page faults after this switch to drm-510 05:58:52 I have not seen the ZFS-related page faults since I stopped running my browser session on an encrypted dataset, so maybe these are separate issues 06:00:15 still hopeful this is just a FreeBSD problem, seems less annoying than the hardware being faulty and me having to return this and get a third laptop in this ordeal around replacing my old Latitude 06:00:17 iirc iwlwifi wins the probe if both get loaded 06:00:26 better check sysctl net.wlan.devices 06:00:45 net.wlan.devices: iwm0 06:01:08 ok, sorry for the paranoia :-) 06:01:36 I've got wlans_iwm0="wlan0", but I think the installer put it there 06:02:48 could see iwlwifi still causing issues assuming it's putting things in the linux_rcu queue or cache or whatever that subsystem is doing 06:03:09 kernel code is a bit out of my depth even though I did look at the offending lines 06:03:29 it shouldn't be doing anything rcu, but yeah- better safe than sorry 06:04:00 so amdgpu is basically *the* culprit for rcu stuff? 06:04:07 (rcu is an object lifetime / mutation thing, kind of like (and probably inspired) our epoch(9)) 06:04:44 in both page faults (rcu and the zfs zap stuff) it looked like the kernel was page faulting dereferencing a struct that had a function pointer 06:04:52 if the extent of your use of iwlwifi is "it's loaded" then yeah, I'd be surprised if iwlwifi's doing anything there 06:05:26 though at least the rcu pagefault, it seems like it's dereferencing the struct just fine, but the FP is invalid 06:05:45 the fault virtual address is identical to the instruction pointer 06:07:13 yeah, the invalid instruction pointer shows in the backtrace, immediately followed by "signal handler called" and the page fault trap + panic stuff 06:20:53 in the case of the ZFS zap stuff, the instruction pointer doesn't seem to be the issue, instead it seems to be the struct. fault virtual address is 0x458 which pretty obviously stands out as something that's probably not an address 11:29:37 yo. i need some help. i'm having trouble with high disk io (htop shows constantly 100-200% disk usage). I'm running a windows vm that i think is the culprit, as I've run the same software on a pure windows install before and come into the same problems. I've earlier assumed disk issues or raid rebuild issues - but now i'm running on zfs and getting the same issues 11:30:10 what tools should i use to try narrow down the bottleneck? 11:30:44 it's ssd drives and the throughput seems to be stuck at like 5MB / sec, which is impossibly low even if it were random read/writes 11:41:17 sounds unlikely, but SSDs slow down for writes when they're near full 11:46:31 i'm using image based disk in bhyve, and windows vm shows no indication as to why it's running at 100% disk (it's not reporting how much disk io it's using properly) 11:47:53 image is growing dynamically. but it's not growing (fast) at the moment. vm disk is 200gb, i've used about 50gb. vm has 150gb available. host has 200gb available 11:51:48 I'm no expert in BSD filesystem implementations, but I'm wondering if they use TRIM. Might be worth some research for you 11:53:11 Any bias in R vs. W service times would be of interest. Is "iostat" available? 11:56:54 on host yes 11:58:00 "iostat -dxz" on an active system is a go-to for me 11:58:22 wups, "iostat -dxz 10" 12:00:23 https://paste.mozilla.org/gCLKHNdV 12:00:24 Title: Mozilla Community Pastebin/gCLKHNdV (Console/Bash Session) 12:00:30 what does that tell you :p 12:02:43 that the iostat is far less informative than the linux one, and that your disk(s) - are they both on one phys? - are not performing well 12:04:27 it doesn't give service times, even without a R/W split, unfortunately 12:05:21 not sure if they are both on one phys, how do i tell 12:05:34 but if that's a typical sample, 100% busy disk just isn't good 12:05:58 problem is it keeps happening with different physical computers 12:06:00 do you have >1 SSD? 12:06:03 yea 2 12:06:06 in a zpool 12:06:26 installing smartmon 12:06:43 must be a config thing, if it's on several systems 12:07:54 have you checked with gstat(8)? have you checked "top -m io"? if this is zfs and a vm have you considered volblocksize of the zvol vs blocksize in the vm fs, a mismatch can cause bad times 12:10:25 top -m io seems to not display everything as it's most of the time at 0% but sometimes jumps to 100%, while htop is almost always at 100+% 12:11:22 also top -m io shows something when i launch it, looks like a total, then switches to showing very low values 12:15:19 tykling: it's an issue that started after a few weeks of running the vm. also it's so bad i don't think blocksize is the issue 12:17:53 does zfs ever rebuild or resync zpool mirror ? 12:18:07 still worth checking, that blocksize 12:18:24 zpool status shows all online and no known data errors 12:18:53 all block sizes should be 512, not sure how to check zvol 12:19:42 also i've been getting the same issue on other computers running windows / ntfs 12:44:25 hmm isn't average block erase count over 2 months @ raw value 68 pretty high? 12:45:05 7hr uptime after switching to drm-510-kmod, had dozed off for some of this time, but I was watching quite a bit of youtube which seemed to accelerate this issue triggering when I was on drm-515-kmod 12:45:31 I think I might've found the culprit, unless I've just gotten really lucky about not triggering a page fault 12:45:41 normally it happens in under 3hr 12:48:32 more like unlucky in this case I guess because if it does happen on drm-510-kmod I want it to not waste my time and do it already, so I can know whether the switch has resolved things 12:51:53 will test further tomorrow, and see if I have time to send in a bug report 12:56:41 Bheam: try ordering by the "total" column when running "top -m io" (press "o" and type "total" and press enter inside top), also, if you enabled system hardening you need to run top as root to see everything 13:03:11 tykling: not sure how to interpret this: https://paste.mozilla.org/1bkTZadi 13:03:12 Title: Mozilla Community Pastebin/1bkTZadi (Console/Bash Session) 13:06:24 Bheam: press "a" 13:06:37 Bheam: a single bhyve process is using 100% of your io 13:06:50 if you press a you can see the whole commandline of stuff 13:06:50 i know 13:07:05 i have nothing running on host, everything is in vm 13:07:40 the host does the actual hw io for disks though, since the vm is using file based image 13:18:33 have you tested the host itself, as opposed to what a VM sees of it? 13:26:31 jgh: i know disk io is an issue as the host is lagging too when the vm struggles 13:30:54 that... does not quite answer the question 13:31:51 https://www.nuug.no/aktiviteter/20240312-freebsd-and-absurdity-of-security/ FreeBSD and Security talk in Oslo later today in 4 hours not sure if its in English or Norweigan however 13:31:52 Title: FreeBSD and the absurdities of security compliance 13:37:18 is there some stream? as the abstract is in english I would expect the talk to be in english too, well, if there is a stream I will see that later 13:44:29 nimaje: think they will stream on their Youtube channel 13:44:45 https://www.youtube.com/@nuug/streams 13:44:47 Title: NUUG - YouTube 13:47:20 asked in their irc channel coming back if i get an answer 13:49:27 Foredraget vil foregå på engelsk. helps if one can read norweigan text, it will be held in english 14:39:00 throwing this idea out here, this is how i would like to setup my network topology.. does anyone see GLARING issues with this? https://1drv.ms/i/s!Ag86nuiRCza3jahmMr_UcUmhyKk9Lg?e=pOg2o3 objective: To have two networks, one with general purpose services and one that has backup/server services that are NOT available to the general network team.. only servers.. 16:58:27 stream for the nuug talk later: https://youtube.com/live/DkepLbF5eKg?feature=share 16:58:29 Title: FreeBSD and the absurdities of security compliance,med Eirik Øverby - YouTube 16:59:41 ah nimaje you where faster 17:01:21 that looks fun 19:47:22 does iwlwifi(4) support hostap 19:47:33 ive been trying to setup an opnsense router in a bhyve vm 19:52:28 so if I'm gonna send in a bug report regarding these page faults, I'm guessing I send it to drm-kmod, since it seems like the 515 drivers trigger it but the 510 drivers don't, despite the fact that the actual fault occurs in linux_rcu? 19:53:25 without me building like an unoptimized debug kernel and actually diving in I'm not sure if drm-515 is doing what it's supposed to and linux_rcu is breaking, or the other way around 20:06:53 thegman: i don't see why not; what happens when you try hostap 20:09:03 tp512: how much kernel knowledge do you need to do that kind of analysis? 20:12:48 johnjaye: honestly not sure. I'd need to figure out where the dangling pointer comes from 20:16:05 im not booted into freebsd right now but ifconfig caps wlan0 didnt show anything related to hostap 20:16:16 pretty sure a dangling pointer is to blame. linux_rcu_cleaner_func goes through a linked list to "dispatch callbacks", but one of the function pointers in this list doesn't point to a valid area of memory, kernel execution jumps to it and immediately page faults 20:16:20 and hostap didnt work when i tried it although i did probably configure it incorrectly 20:17:48 I don't know exactly how amdgpu is interacting with the rcu stuff, I'd probably need to make a debug build of that module as well and hope that having so much unoptimized code running doesn't actually get in the way of using this laptop until a crash happens 20:18:18 if i remember right hostap tried to start then it disabled wlan0 then it closed saying "wlan0 is down" 20:52:46 i finally got around to moving my irssi config to freebsd so i dont have to keep rebooting 20:55:52 ifconfig wlan0 create wlandev iwlwifi0 wlanmode hostap says "ifconfig: SIOCIFCREATE2 (wlan0): Operation not supported" 21:30:39 Which one of you assholes signed me up for Fox News alerts? 22:57:51 so FreeBSD doesn't support the Xbox Series S|X controllers? or is this just SDL being a pain? 22:58:26 seems weird to me that it's not showing up in SDL applications but this PS5 controller apparently works just fine 23:09:38 pretty sure even NetBSD supported the Series X|S controller just out of the box, so I wonder if the driver might be able to be ported over 23:58:04 I'm trying to remove a jail. Location is /usr/local/share/classic jls shows it is not running. When i try chflags -R 0 /usr/local/jails/classic I get "No such file or directory". How do I remove this jail?