-
tp512
got a completely different kernel panic this time :/ I really hope the hardware isn't causing problems
-
tp512
two ThinkPads in a row with hardware issues isn't something I want to deal with
-
tp512
this time linux_rcu_cleaner_func is causing a page fault
-
tp512
hmm, y'know, maybe I should run a RAM test
-
tp512
so the extended memory test in the Lenovo diagnostics utility reported no issues with this RAM. I guess I either have to run memtest86 for a super thorough test, or I try running Linux on this, see if it randomly page faults in the same way, which would be a strong indicator of hardware issues
-
tp512
so two of these page faults are in linux_rcu_cleaner_func. I dunno how likely it'd be for the exact same spot in code to crash if hardware issues are to blame. unless the memory layout is identical between these reboots
-
tp512
so I upgraded to the latest 14-STABLE on pkgbase, since that also gives me the debug stuff that I was apparently lacking before. it doesn't seem like I have any option to boot into a debug kernel though? I thought I saw it install one
-
tp512
dunno if this is actually going to improve debugging this since the issue was things being optimized out
-
f451
hi, Can an amd64 zfs host be used to tftp & nfs a diskless arm64.aarch64 system?
-
rwp
tftp and nfs are both architecture independent protocols. To them everything is just binary files on disk. So, yes they can.
-
mns
f451: yeah, I don't see a reason not to. We used to do that in the 90s. Had one server that hosted the images for SPARC, POWER, PA-RISC, Alpha, etc and booted the various systems off that system. Should be able to do it even today.
-
tp512
closing in on 2hr uptime on 14-STABLE through pkgbase's base_latest repo, no page faults yet but if it's gonna happen I'm guessing it'll be soon, will be interesting to see if it occurs in linux_rcu_cleaner_func again
-
f451
mns: thanks. i was worried about endian-ness. i could x-compile the arm64 from the amd64 server i guess
-
kevans
these are both LE platforms anyways on FreeBSD
-
f451
LE?
-
kevans
little endian
-
f451
ah, sorry
-
kevans
our only big endian platforms are powerpc these days, and there's a powerpc64el variant that some use
-
mns
even the sparc is little endian ?
-
kevans
sparc is gone
-
kevans
mips is gone
-
kevans
well, mips is still in stable/13, but sparc isn't there
-
f451
even mips64 :
-
mns
I meant sparc64, but yeah all sparc are gone, I forgot about that
-
f451
i wonder how quick the arm64 would be if the nfs host was ktls
-
f451
no aes or similar hardware
-
mns
no idea, I don't have any arm64 systems, just a small amd64 system
-
mns
I got to figure out when to upgrade to 13.3-RELEASE
-
tp512
happened again, fault in the exact same place in linux_rcu_cleaner_func
-
tp512
trying drm-510-kmod now since I think that's the only thing I have running currently using linuxkpi
-
kevans
any lkpi wifi there?
-
kevans
iwlwifi, rtw88
-
tp512
I'm using iwm
-
tp512
iwlwifi supports this chipset (Intel AC 9260) but I haven't felt a pressing need to try out iwlwifi yet
-
tp512
iwm is a bit sluggish to connect to APs sometimes but it gets there eventually
-
tp512
hmm, FreeBSD does load if_iwlwifi.ko though
-
tp512
I wonder if it could be causing issues even though it's not actually doing much. I guess I'll blacklist it if I still get page faults after this switch to drm-510
-
tp512
I have not seen the ZFS-related page faults since I stopped running my browser session on an encrypted dataset, so maybe these are separate issues
-
tp512
still hopeful this is just a FreeBSD problem, seems less annoying than the hardware being faulty and me having to return this and get a third laptop in this ordeal around replacing my old Latitude
-
kevans
iirc iwlwifi wins the probe if both get loaded
-
kevans
better check sysctl net.wlan.devices
-
tp512
net.wlan.devices: iwm0
-
kevans
ok, sorry for the paranoia :-)
-
tp512
I've got wlans_iwm0="wlan0", but I think the installer put it there
-
tp512
could see iwlwifi still causing issues assuming it's putting things in the linux_rcu queue or cache or whatever that subsystem is doing
-
tp512
kernel code is a bit out of my depth even though I did look at the offending lines
-
kevans
it shouldn't be doing anything rcu, but yeah- better safe than sorry
-
tp512
so amdgpu is basically *the* culprit for rcu stuff?
-
kevans
(rcu is an object lifetime / mutation thing, kind of like (and probably inspired) our epoch(9))
-
tp512
in both page faults (rcu and the zfs zap stuff) it looked like the kernel was page faulting dereferencing a struct that had a function pointer
-
kevans
if the extent of your use of iwlwifi is "it's loaded" then yeah, I'd be surprised if iwlwifi's doing anything there
-
tp512
though at least the rcu pagefault, it seems like it's dereferencing the struct just fine, but the FP is invalid
-
tp512
the fault virtual address is identical to the instruction pointer
-
tp512
yeah, the invalid instruction pointer shows in the backtrace, immediately followed by "signal handler called" and the page fault trap + panic stuff
-
tp512
in the case of the ZFS zap stuff, the instruction pointer doesn't seem to be the issue, instead it seems to be the struct. fault virtual address is 0x458 which pretty obviously stands out as something that's probably not an address
-
Bheam
yo. i need some help. i'm having trouble with high disk io (htop shows constantly 100-200% disk usage). I'm running a windows vm that i think is the culprit, as I've run the same software on a pure windows install before and come into the same problems. I've earlier assumed disk issues or raid rebuild issues - but now i'm running on zfs and getting the same issues
-
Bheam
what tools should i use to try narrow down the bottleneck?
-
Bheam
it's ssd drives and the throughput seems to be stuck at like 5MB / sec, which is impossibly low even if it were random read/writes
-
jgh
sounds unlikely, but SSDs slow down for writes when they're near full
-
Bheam
i'm using image based disk in bhyve, and windows vm shows no indication as to why it's running at 100% disk (it's not reporting how much disk io it's using properly)
-
Bheam
image is growing dynamically. but it's not growing (fast) at the moment. vm disk is 200gb, i've used about 50gb. vm has 150gb available. host has 200gb available
-
jgh
I'm no expert in BSD filesystem implementations, but I'm wondering if they use TRIM. Might be worth some research for you
-
jgh
Any bias in R vs. W service times would be of interest. Is "iostat" available?
-
Bheam
on host yes
-
jgh
"iostat -dxz" on an active system is a go-to for me
-
jgh
wups, "iostat -dxz 10"
-
Bheam
-
VimDiesel
Title: Mozilla Community Pastebin/gCLKHNdV (Console/Bash Session)
-
Bheam
what does that tell you :p
-
jgh
that the iostat is far less informative than the linux one, and that your disk(s) - are they both on one phys? - are not performing well
-
jgh
it doesn't give service times, even without a R/W split, unfortunately
-
Bheam
not sure if they are both on one phys, how do i tell
-
jgh
but if that's a typical sample, 100% busy disk just isn't good
-
Bheam
problem is it keeps happening with different physical computers
-
jgh
do you have >1 SSD?
-
Bheam
yea 2
-
Bheam
in a zpool
-
Bheam
installing smartmon
-
jgh
must be a config thing, if it's on several systems
-
tykling
have you checked with gstat(8)? have you checked "top -m io"? if this is zfs and a vm have you considered volblocksize of the zvol vs blocksize in the vm fs, a mismatch can cause bad times
-
Bheam
top -m io seems to not display everything as it's most of the time at 0% but sometimes jumps to 100%, while htop is almost always at 100+%
-
Bheam
also top -m io shows something when i launch it, looks like a total, then switches to showing very low values
-
Bheam
tykling: it's an issue that started after a few weeks of running the vm. also it's so bad i don't think blocksize is the issue
-
Bheam
does zfs ever rebuild or resync zpool mirror ?
-
jgh
still worth checking, that blocksize
-
Bheam
zpool status shows all online and no known data errors
-
Bheam
all block sizes should be 512, not sure how to check zvol
-
Bheam
also i've been getting the same issue on other computers running windows / ntfs
-
Bheam
hmm isn't average block erase count over 2 months @ raw value 68 pretty high?
-
tp512
7hr uptime after switching to drm-510-kmod, had dozed off for some of this time, but I was watching quite a bit of youtube which seemed to accelerate this issue triggering when I was on drm-515-kmod
-
tp512
I think I might've found the culprit, unless I've just gotten really lucky about not triggering a page fault
-
tp512
normally it happens in under 3hr
-
tp512
more like unlucky in this case I guess because if it does happen on drm-510-kmod I want it to not waste my time and do it already, so I can know whether the switch has resolved things
-
tp512
will test further tomorrow, and see if I have time to send in a bug report
-
tykling
Bheam: try ordering by the "total" column when running "top -m io" (press "o" and type "total" and press enter inside top), also, if you enabled system hardening you need to run top as root to see everything
-
Bheam
tykling: not sure how to interpret this:
paste.mozilla.org/1bkTZadi
-
VimDiesel
Title: Mozilla Community Pastebin/1bkTZadi (Console/Bash Session)
-
tykling
Bheam: press "a"
-
tykling
Bheam: a single bhyve process is using 100% of your io
-
tykling
if you press a you can see the whole commandline of stuff
-
Bheam
i know
-
Bheam
i have nothing running on host, everything is in vm
-
Bheam
the host does the actual hw io for disks though, since the vm is using file based image
-
jgh
have you tested the host itself, as opposed to what a VM sees of it?
-
Bheam
jgh: i know disk io is an issue as the host is lagging too when the vm struggles
-
jgh
that... does not quite answer the question
-
luna_
nuug.no/aktiviteter/20240312-freebsd-and-absurdity-of-security FreeBSD and Security talk in Oslo later today in 4 hours not sure if its in English or Norweigan however
-
VimDiesel
Title: FreeBSD and the absurdities of security compliance
-
nimaje
is there some stream? as the abstract is in english I would expect the talk to be in english too, well, if there is a stream I will see that later
-
luna_
nimaje: think they will stream on their Youtube channel
-
luna_
-
VimDiesel
Title: NUUG - YouTube
-
luna_
asked in their irc channel coming back if i get an answer
-
luna_
Foredraget vil foregå på engelsk. helps if one can read norweigan text, it will be held in english
-
voy4g3r2
throwing this idea out here, this is how i would like to setup my network topology.. does anyone see GLARING issues with this?
1drv.ms/i/s!Ag86nuiRCza3jahmMr_UcUmhyKk9Lg?e=pOg2o3 objective: To have two networks, one with general purpose services and one that has backup/server services that are NOT available to the general network team.. only servers..
-
nimaje
-
VimDiesel
Title: FreeBSD and the absurdities of security compliance,med Eirik Øverby - YouTube
-
luna_
ah nimaje you where faster
-
ZedHedTed
that looks fun
-
thegman
does iwlwifi(4) support hostap
-
thegman
ive been trying to setup an opnsense router in a bhyve vm
-
tp512
so if I'm gonna send in a bug report regarding these page faults, I'm guessing I send it to drm-kmod, since it seems like the 515 drivers trigger it but the 510 drivers don't, despite the fact that the actual fault occurs in linux_rcu?
-
tp512
without me building like an unoptimized debug kernel and actually diving in I'm not sure if drm-515 is doing what it's supposed to and linux_rcu is breaking, or the other way around
-
rtprio
thegman: i don't see why not; what happens when you try hostap
-
johnjaye
tp512: how much kernel knowledge do you need to do that kind of analysis?
-
tp512
johnjaye: honestly not sure. I'd need to figure out where the dangling pointer comes from
-
thegman
im not booted into freebsd right now but ifconfig caps wlan0 didnt show anything related to hostap
-
tp512
pretty sure a dangling pointer is to blame. linux_rcu_cleaner_func goes through a linked list to "dispatch callbacks", but one of the function pointers in this list doesn't point to a valid area of memory, kernel execution jumps to it and immediately page faults
-
thegman
and hostap didnt work when i tried it although i did probably configure it incorrectly
-
tp512
I don't know exactly how amdgpu is interacting with the rcu stuff, I'd probably need to make a debug build of that module as well and hope that having so much unoptimized code running doesn't actually get in the way of using this laptop until a crash happens
-
thegman
if i remember right hostap tried to start then it disabled wlan0 then it closed saying "wlan0 is down"
-
thegman
i finally got around to moving my irssi config to freebsd so i dont have to keep rebooting
-
thegman
ifconfig wlan0 create wlandev iwlwifi0 wlanmode hostap says "ifconfig: SIOCIFCREATE2 (wlan0): Operation not supported"
-
CrtxReavr
Which one of you assholes signed me up for Fox News alerts?
-
tp512
so FreeBSD doesn't support the Xbox Series S|X controllers? or is this just SDL being a pain?
-
tp512
seems weird to me that it's not showing up in SDL applications but this PS5 controller apparently works just fine
-
tp512
pretty sure even NetBSD supported the Series X|S controller just out of the box, so I wonder if the driver might be able to be ported over
-
mvee
I'm trying to remove a jail. Location is /usr/local/share/classic jls shows it is not running. When i try chflags -R 0 /usr/local/jails/classic I get "No such file or directory". How do I remove this jail?