00:02:58 <tp512> got a completely different kernel panic this time :/ I really hope the hardware isn't causing problems
00:03:30 <tp512> two ThinkPads in a row with hardware issues isn't something I want to deal with
00:04:19 <tp512> this time linux_rcu_cleaner_func is causing a page fault
00:04:43 <tp512> hmm, y'know, maybe I should run a RAM test
01:58:08 <tp512> so the extended memory test in the Lenovo diagnostics utility reported no issues with this RAM. I guess I either have to run memtest86 for a super thorough test, or I try running Linux on this, see if it randomly page faults in the same way, which would be a strong indicator of hardware issues
02:06:46 <tp512> so two of these page faults are in linux_rcu_cleaner_func. I dunno how likely it'd be for the exact same spot in code to crash if hardware issues are to blame. unless the memory layout is identical between these reboots
03:06:46 <tp512> so I upgraded to the latest 14-STABLE on pkgbase, since that also gives me the debug stuff that I was apparently lacking before. it doesn't seem like I have any option to boot into a debug kernel though? I thought I saw it install one
03:07:41 <tp512> dunno if this is actually going to improve debugging this since the issue was things being optimized out
03:36:55 <f451> hi, Can an amd64 zfs host be used to tftp & nfs a diskless arm64.aarch64 system?
03:39:34 <rwp> tftp and nfs are both architecture independent protocols.  To them everything is just binary files on disk.  So, yes they can.
03:45:54 <mns> f451: yeah, I don't see a reason not to.  We used to do that in the 90s.  Had one server that hosted the images for SPARC, POWER, PA-RISC, Alpha, etc and booted the various systems off that system.  Should be able to do it even today.
04:51:33 <tp512> closing in on 2hr uptime on 14-STABLE through pkgbase's base_latest repo, no page faults yet but if it's gonna happen I'm guessing it'll be soon, will be interesting to see if it occurs in linux_rcu_cleaner_func again
05:04:09 <f451> mns: thanks. i was worried about endian-ness. i could x-compile the arm64 from the amd64 server i guess
05:05:27 <kevans> these are both LE platforms anyways on FreeBSD
05:05:51 <f451> LE?
05:05:54 <kevans> little endian
05:06:16 <f451> ah, sorry
05:06:43 <kevans> our only big endian platforms are powerpc these days, and there's a powerpc64el variant that some use
05:07:15 <mns> even the sparc is little endian ?
05:08:26 <kevans> sparc is gone
05:08:30 <kevans> mips is gone
05:08:54 <kevans> well, mips is still in stable/13, but sparc isn't there
05:08:58 <f451> even mips64 :
05:11:23 <mns> I meant sparc64, but yeah all sparc are gone, I forgot about that
05:11:46 <f451> i wonder how quick the arm64 would be if the nfs host was ktls
05:13:16 <f451> no aes or similar hardware
05:24:23 <mns> no idea, I don't have any arm64 systems, just a small amd64 system
05:25:23 <mns> I got to figure out when to upgrade to 13.3-RELEASE
05:51:36 <tp512> happened again, fault in the exact same place in linux_rcu_cleaner_func
05:51:56 <tp512> trying drm-510-kmod now since I think that's the only thing I have running currently using linuxkpi
05:53:42 <kevans> any lkpi wifi there?
05:54:13 <kevans> iwlwifi, rtw88
05:54:24 <tp512> I'm using iwm
05:54:46 <tp512> iwlwifi supports this chipset (Intel AC 9260) but I haven't felt a pressing need to try out iwlwifi yet
05:55:22 <tp512> iwm is a bit sluggish to connect to APs sometimes but it gets there eventually
05:56:12 <tp512> hmm, FreeBSD does load if_iwlwifi.ko though
05:57:35 <tp512> I wonder if it could be causing issues even though it's not actually doing much. I guess I'll blacklist it if I still get page faults after this switch to drm-510
05:58:52 <tp512> I have not seen the ZFS-related page faults since I stopped running my browser session on an encrypted dataset, so maybe these are separate issues
06:00:15 <tp512> still hopeful this is just a FreeBSD problem, seems less annoying than the hardware being faulty and me having to return this and get a third laptop in this ordeal around replacing my old Latitude
06:00:17 <kevans> iirc iwlwifi wins the probe if both get loaded
06:00:26 <kevans> better check sysctl net.wlan.devices
06:00:45 <tp512> net.wlan.devices: iwm0
06:01:08 <kevans> ok, sorry for the paranoia :-)
06:01:36 <tp512> I've got wlans_iwm0="wlan0", but I think the installer put it there
06:02:48 <tp512> could see iwlwifi still causing issues assuming it's putting things in the linux_rcu queue or cache or whatever that subsystem is doing
06:03:09 <tp512> kernel code is a bit out of my depth even though I did look at the offending lines
06:03:29 <kevans> it shouldn't be doing anything rcu, but yeah- better safe than sorry
06:04:00 <tp512> so amdgpu is basically *the* culprit for rcu stuff?
06:04:07 <kevans> (rcu is an object lifetime / mutation thing, kind of like (and probably inspired) our epoch(9))
06:04:44 <tp512> in both page faults (rcu and the zfs zap stuff) it looked like the kernel was page faulting dereferencing a struct that had a function pointer
06:04:52 <kevans> if the extent of your use of iwlwifi is "it's loaded" then yeah, I'd be surprised if iwlwifi's doing anything there
06:05:26 <tp512> though at least the rcu pagefault, it seems like it's dereferencing the struct just fine, but the FP is invalid
06:05:45 <tp512> the fault virtual address is identical to the instruction pointer
06:07:13 <tp512> yeah, the invalid instruction pointer shows in the backtrace, immediately followed by "signal handler called" and the page fault trap + panic stuff
06:20:53 <tp512> in the case of the ZFS zap stuff, the instruction pointer doesn't seem to be the issue, instead it seems to be the struct. fault virtual address is 0x458 which pretty obviously stands out as something that's probably not an address
11:29:37 <Bheam> yo. i need some help. i'm having trouble with high disk io (htop shows constantly 100-200% disk usage). I'm running a windows vm that i think is the culprit, as I've run the same software on a pure windows install before and come into the same problems. I've earlier assumed disk issues or raid rebuild issues - but now i'm running on zfs and getting the same issues
11:30:10 <Bheam> what tools should i use to try narrow down the bottleneck?
11:30:44 <Bheam> it's ssd drives and the throughput seems to be stuck at like 5MB / sec, which is impossibly low even if it were random read/writes
11:41:17 <jgh> sounds unlikely, but SSDs slow down for writes when they're near full
11:46:31 <Bheam> i'm using image based disk in bhyve, and windows vm shows no indication as to why it's running at 100% disk (it's not reporting how much disk io it's using properly)
11:47:53 <Bheam> image is growing dynamically. but it's not growing (fast) at the moment. vm disk is 200gb, i've used about 50gb. vm has 150gb available. host has 200gb available
11:51:48 <jgh> I'm no expert in BSD filesystem implementations, but I'm wondering if they use TRIM.  Might be worth some research for you
11:53:11 <jgh> Any bias in R vs. W service times would be of interest.  Is "iostat" available?
11:56:54 <Bheam> on host yes
11:58:00 <jgh> "iostat -dxz" on an active system is a go-to for me
11:58:22 <jgh> wups, "iostat -dxz 10"
12:00:23 <Bheam> https://paste.mozilla.org/gCLKHNdV
12:00:24 <VimDiesel> Title: Mozilla Community Pastebin/gCLKHNdV (Console/Bash Session)
12:00:30 <Bheam> what does that tell you :p
12:02:43 <jgh> that the iostat is far less informative than the linux one, and that your disk(s) - are they both on one phys? - are not performing well
12:04:27 <jgh> it doesn't give service times, even without a R/W split, unfortunately
12:05:21 <Bheam> not sure if they are both on one phys, how do i tell
12:05:34 <jgh> but if that's a typical sample, 100% busy disk just isn't good
12:05:58 <Bheam> problem is it keeps happening with different physical computers
12:06:00 <jgh> do you have >1 SSD?
12:06:03 <Bheam> yea 2
12:06:06 <Bheam> in a zpool
12:06:26 <Bheam> installing smartmon
12:06:43 <jgh> must be a config thing, if it's on several systems
12:07:54 <tykling> have you checked with gstat(8)? have you checked "top -m io"? if this is zfs and a vm have you considered volblocksize of the zvol vs blocksize in the vm fs, a mismatch can cause bad times
12:10:25 <Bheam> top -m io seems to not display everything as it's most of the time at 0% but sometimes jumps to 100%, while htop is almost always at 100+%
12:11:22 <Bheam> also top -m io shows something when i launch it, looks like a total, then switches to showing very low values
12:15:19 <Bheam> tykling: it's an issue that started after a few weeks of running the vm. also it's so bad i don't think blocksize is the issue
12:17:53 <Bheam> does zfs ever rebuild or resync zpool mirror ?
12:18:07 <jgh> still worth checking, that blocksize
12:18:24 <Bheam> zpool status shows all online and no known data errors
12:18:53 <Bheam> all block sizes should be 512, not sure how to check zvol
12:19:42 <Bheam> also i've been getting the same issue on other computers running windows / ntfs
12:44:25 <Bheam> hmm isn't average block erase count over 2 months @ raw value 68 pretty high?
12:45:05 <tp512> 7hr uptime after switching to drm-510-kmod, had dozed off for some of this time, but I was watching quite a bit of youtube which seemed to accelerate this issue triggering when I was on drm-515-kmod
12:45:31 <tp512> I think I might've found the culprit, unless I've just gotten really lucky about not triggering a page fault
12:45:41 <tp512> normally it happens in under 3hr
12:48:32 <tp512> more like unlucky in this case I guess because if it does happen on drm-510-kmod I want it to not waste my time and do it already, so I can know whether the switch has resolved things
12:51:53 <tp512> will test further tomorrow, and see if I have time to send in a bug report
12:56:41 <tykling> Bheam: try ordering by the "total" column when running "top -m io" (press "o" and type "total" and press enter inside top), also, if you enabled system hardening you need to run top as root to see everything
13:03:11 <Bheam> tykling: not sure how to interpret this: https://paste.mozilla.org/1bkTZadi
13:03:12 <VimDiesel> Title: Mozilla Community Pastebin/1bkTZadi (Console/Bash Session)
13:06:24 <tykling> Bheam: press "a"
13:06:37 <tykling> Bheam: a single bhyve process is using 100% of your io
13:06:50 <tykling> if you press a you can see the whole commandline of stuff
13:06:50 <Bheam> i know
13:07:05 <Bheam> i have nothing running on host, everything is in vm
13:07:40 <Bheam> the host does the actual hw io for disks though, since the vm is using file based image
13:18:33 <jgh> have you tested the host itself, as opposed to what a VM sees of it?
13:26:31 <Bheam> jgh: i know disk io is an issue as the host is lagging too when the vm struggles
13:30:54 <jgh> that... does not quite answer the question
13:31:51 <luna_> https://www.nuug.no/aktiviteter/20240312-freebsd-and-absurdity-of-security/ FreeBSD and Security talk in Oslo later today in 4 hours not sure if its in English or Norweigan however
13:31:52 <VimDiesel> Title: FreeBSD and the absurdities of security compliance
13:37:18 <nimaje> is there some stream? as the abstract is in english I would expect the talk to be in english too, well, if there is a stream I will see that later
13:44:29 <luna_> nimaje: think they will stream on their Youtube channel
13:44:45 <luna_> https://www.youtube.com/@nuug/streams
13:44:47 <VimDiesel> Title: NUUG - YouTube
13:47:20 <luna_> asked in their irc channel coming back if i get an answer
13:49:27 <luna_> Foredraget vil foregå på engelsk. helps if one can read norweigan text, it will be held in english
14:39:00 <voy4g3r2> throwing this idea out here, this is how i would like to setup my network topology.. does anyone see GLARING issues with this? https://1drv.ms/i/s!Ag86nuiRCza3jahmMr_UcUmhyKk9Lg?e=pOg2o3 objective: To have two networks, one with general purpose services and one that has backup/server services that are NOT available to the general network team.. only servers..
16:58:27 <nimaje> stream for the nuug talk later: https://youtube.com/live/DkepLbF5eKg?feature=share
16:58:29 <VimDiesel> Title: FreeBSD and the absurdities of security compliance,med Eirik Øverby - YouTube
16:59:41 <luna_> ah nimaje you where faster
17:01:21 <ZedHedTed> that looks fun
19:47:22 <thegman> does iwlwifi(4) support hostap
19:47:33 <thegman> ive been trying to setup an opnsense router in a bhyve vm
19:52:28 <tp512> so if I'm gonna send in a bug report regarding these page faults, I'm guessing I send it to drm-kmod, since it seems like the 515 drivers trigger it but the 510 drivers don't, despite the fact that the actual fault occurs in linux_rcu?
19:53:25 <tp512> without me building like an unoptimized debug kernel and actually diving in I'm not sure if drm-515 is doing what it's supposed to and linux_rcu is breaking, or the other way around
20:06:53 <rtprio> thegman: i don't see why not; what happens when you try hostap
20:09:03 <johnjaye> tp512: how much kernel knowledge do you need to do that kind of analysis?
20:12:48 <tp512> johnjaye: honestly not sure. I'd need to figure out where the dangling pointer comes from
20:16:05 <thegman> im not booted into freebsd right now but ifconfig caps wlan0 didnt show anything related to hostap
20:16:16 <tp512> pretty sure a dangling pointer is to blame. linux_rcu_cleaner_func goes through a linked list to "dispatch callbacks", but one of the function pointers in this list doesn't point to a valid area of memory, kernel execution jumps to it and immediately page faults
20:16:20 <thegman> and hostap didnt work when i tried it although i did probably configure it incorrectly
20:17:48 <tp512> I don't know exactly how amdgpu is interacting with the rcu stuff, I'd probably need to make a debug build of that module as well and hope that having so much unoptimized code running doesn't actually get in the way of using this laptop until a crash happens
20:18:18 <thegman> if i remember right hostap tried to start then it disabled wlan0 then it closed saying "wlan0 is down"
20:52:46 <thegman> i finally got around to moving my irssi config to freebsd so i dont have to keep rebooting
20:55:52 <thegman> ifconfig wlan0 create wlandev iwlwifi0 wlanmode hostap says "ifconfig: SIOCIFCREATE2 (wlan0): Operation not supported"
21:30:39 <CrtxReavr> Which one of you assholes signed me up for Fox News alerts?
22:57:51 <tp512> so FreeBSD doesn't support the Xbox Series S|X controllers? or is this just SDL being a pain?
22:58:26 <tp512> seems weird to me that it's not showing up in SDL applications but this PS5 controller apparently works just fine
23:09:38 <tp512> pretty sure even NetBSD supported the Series X|S controller just out of the box, so I wonder if the driver might be able to be ported over
23:58:04 <mvee> I'm trying to remove a jail. Location is /usr/local/share/classic   jls shows it is not running. When i try  chflags -R 0 /usr/local/jails/classic  I get "No such file or directory". How do I remove this jail?