-
sjorge
lwp_* family of calls have to do with threading right?
-
sjorge
the zrepl truss' all are full of these calls and the truss output seems to be the same among the crashes i've campture
-
sjorge
back to consistently crashing now 😵
-
sjorge
i wonder if it's some thread timing think where it screws up
-
sjorge
when trying to draw the tui
-
sjorge
Some more poking about with truss, when zrepl fails it does a call to ioctl(0, TCSETS, 0xC00053D9E0) then a bunch of lwp_park()'s and then a panic, whe nit wokrs there are no lwp_park() and it does a ioctl(0, TIOCGWINSZ, 0xC00049F768)
-
nahamu
Is anyone else seeing weird behavior of NVMe drives? So far both sjorge and I have seen oddities. I'm wondering what the chances are that it's actually a software issue.
-
sjorge
There was that one person over in #smartos no?
-
nahamu
Yes, but they said it was cmos battery dying
-
sjorge
Ah
-
yuripv
fwiw, virtual nvmes on vmware hypervisors seem to work fine for me :D
-
andyf
sjorge - we can replicate the terminal thing now. It doesn't happen if you are not root, or if you use `pfexec`, apparently
-
andyf
Oh, it's `sudo`.. Try `pkg update sudo⊙1` (i.e. downgrade) and that should fix it for you
-
andyf
At least it explains why I couldn't replicate it since I never use sudo
-
andyf
Or, actually, do this: echo 'Defaults !use_pty' > /etc/sudoers.d/nopty
-
jbk
nahamu: i've had an NMVe split into l2 arc / slog that's been working fine (even though the lifetime according to it has been exceeded by a fair amount :P)
-
jbk
i have noticed that apparently some NMVe devices are advertising 256kb block sizes now, which might pose a bit of a problem..
-
nahamu
jbk how are you checking the blocksize?
-
jbk
well noticed in that i've seen others report issues w/ zfs and such devices
-
jbk
someone w/ a micron 7450 reported that
-
jbk
apparently 4k emulated, 256k native
-
Woodstock
-
fenix
→
BUG 12237: zpool coredump, vdev_open() returns EDOM with disk image file (but not zvol) from bhyve (New) |
code.illumos.org/c/illumos-gate/+/347
-
Woodstock
unless ASHIFT_MAX has been raised by now that will likely cause the same issue
-
gitomat
[illumos-gate] 15626 diskinfo assumes too much about topo relationships -- Robert Mustacchi <rm⊙fo>
-
papertigers
nahamu: I have an nvme log device on rpool and haven't seen any issues yet that I am aware of
-
papertigers
it's a Samsung SSD 980 PRO 1TB
-
nahamu
I'm starting to think that the issue is with one of the ports on my breakout card.
-
papertigers
and I have an nvme mirror on my other omnios box. Those nvme are Samsung SSD 970 EVO Plus 500GB
-
jbk
mine is a 960 256GB M2 slot
-
sjorge
andyf 'Defaults !use_pty' does indeed seem to fix it
-
sjorge
I mostly use pfexec in lue of sudo stuff but I do use sudo -i
-
sjorge
I think that might explain the mixed behavior between it working and not working
-
sjorge
Is that a known bug in the latest sudo or something we're triggiring now due to a change?
-
sjorge
papertigers I got the same 980 Pro 1T, I had both drop during a scrub last week. I did switch to them from 960 Pro 512G's a few days before
-
sjorge
I think it might be because those are gen4 NVMe's in a gen3 U.2 slot on my board
-
sjorge
Another scrub after didn't trigger it though ¯\_(ツ)_/¯
-
papertigers
I believe my board only supports gen3 pcie fwiw
-
papertigers
I believe its in the m.2 slot on the mobo
-
jbk
sjorge: just checksum errors in scrub?
-
sjorge
As far as I could tell the device just poofed out of existance
-
jbk
oh
-
andyf
It's a change in sudo - as of 1.9.14 they changed the default value of the use_pty option to True.
-
sjorge
andyf I see, so that probably never worked for us
-
jbk
so schrödinger's NVMe
-
andyf
Which means that commands are run in a new pty. I have not looked at all into why that causes problems for us.
-
sjorge
perhaps it's also causing issues for !not illumos
-
andyf
-
sjorge
seems at lease on arch it was also causing some issues
-
nahamu
Well, if I'm going to jinx it I might as well do it sooner rather than later... Moved the new crucial to a different port and the resilver was able to complete. doing another scrub now as a stress test.
-
nahamu
I really hope it's just one of the ports on the card being bad.
-
gitomat
[illumos-gate] 15774 Memory leak in pkcs11_softtoken when used by metaslot -- Matt Barden <mbarden⊙rc>
-
jbk
hrm.. it's been observed that dumping to a zvol is noticably slower than a raw device... im wondering about any techniques for digging into that... im guessing dtrace might not work well in a panic context
-
sjorge
have you ruled out the obvious? compression and stuff on the zvol?
-
richlowe
I don't think we're willing to dump on compressed zvols, etc.
-
richlowe
but checking which of the dump bits are disabled in each environment would be good.
-
richlowe
or ideally checking the same machine configured each way
-
richlowe
I seem to recall that how we compress a dump can vary, so if you're trying on different machines, that might fox you
-
sjorge
redmine down for anybody else?
-
sjorge
gerrit is ok so I think it's bit a issue ob my end
-
yuripv
works for me, though slow
-
rmustacc
Yeah, there has been a recent increase of crawlers, IIUC.
-
nomad
Gotta feed the so-called AI, donchaknow.
-
sjorge
i'll finally be impressed if the ai can create a working driver for illumos by pointing it at a datasheet
-
sjorge
don't think that gonna happen anytime soon
-
nomad
I didn't say it made any sense.
-
jclulow
I have added some more crawlers to the ignore bits
-
jclulow
Performance should hopefully be a bit more uniform now
-
sjorge
it's loading again for me
-
yuripv
yep, much faster at the moment
-
richlowe
the load was very sporadic, too