09:10:15 lwp_* family of calls have to do with threading right? 09:11:50 the zrepl truss' all are full of these calls and the truss output seems to be the same among the crashes i've campture 09:12:05 back to consistently crashing now 😵 09:12:40 i wonder if it's some thread timing think where it screws up 09:12:54 when trying to draw the tui 10:18:23 Some more poking about with truss, when zrepl fails it does a call to ioctl(0, TCSETS, 0xC00053D9E0) then a bunch of lwp_park()'s and then a panic, whe nit wokrs there are no lwp_park() and it does a ioctl(0, TIOCGWINSZ, 0xC00049F768) 11:24:42 Is anyone else seeing weird behavior of NVMe drives? So far both sjorge and I have seen oddities. I'm wondering what the chances are that it's actually a software issue. 11:25:06 There was that one person over in #smartos no? 11:25:27 Yes, but they said it was cmos battery dying 11:25:31 Ah 11:59:29 fwiw, virtual nvmes on vmware hypervisors seem to work fine for me :D 12:11:38 sjorge - we can replicate the terminal thing now. It doesn't happen if you are not root, or if you use `pfexec`, apparently 12:45:28 Oh, it's `sudo`.. Try `pkg update sudo⊙1` (i.e. downgrade) and that should fix it for you 12:46:26 At least it explains why I couldn't replicate it since I never use sudo 12:51:53 Or, actually, do this: echo 'Defaults !use_pty' > /etc/sudoers.d/nopty 13:26:24 nahamu: i've had an NMVe split into l2 arc / slog that's been working fine (even though the lifetime according to it has been exceeded by a fair amount :P) 13:27:04 i have noticed that apparently some NMVe devices are advertising 256kb block sizes now, which might pose a bit of a problem.. 13:27:25 jbk how are you checking the blocksize? 13:27:43 well noticed in that i've seen others report issues w/ zfs and such devices 13:28:46 someone w/ a micron 7450 reported that 13:29:21 apparently 4k emulated, 256k native 13:31:05 jbk: fyi https://www.illumos.org/issues/12237 13:31:06 → BUG 12237: zpool coredump, vdev_open() returns EDOM with disk image file (but not zvol) from bhyve (New) | https://code.illumos.org/c/illumos-gate/+/347 13:32:22 unless ASHIFT_MAX has been raised by now that will likely cause the same issue 13:56:19 [illumos-gate] 15626 diskinfo assumes too much about topo relationships -- Robert Mustacchi 14:02:04 nahamu: I have an nvme log device on rpool and haven't seen any issues yet that I am aware of 14:02:34 it's a Samsung SSD 980 PRO 1TB 14:02:39 I'm starting to think that the issue is with one of the ports on my breakout card. 14:03:21 and I have an nvme mirror on my other omnios box. Those nvme are Samsung SSD 970 EVO Plus 500GB 14:04:24 mine is a 960 256GB M2 slot 14:19:06 andyf 'Defaults !use_pty' does indeed seem to fix it 14:19:21 I mostly use pfexec in lue of sudo stuff but I do use sudo -i 14:19:35 I think that might explain the mixed behavior between it working and not working 14:19:54 Is that a known bug in the latest sudo or something we're triggiring now due to a change? 14:20:41 papertigers I got the same 980 Pro 1T, I had both drop during a scrub last week. I did switch to them from 960 Pro 512G's a few days before 14:21:01 I think it might be because those are gen4 NVMe's in a gen3 U.2 slot on my board 14:21:17 Another scrub after didn't trigger it though ¯\_(ツ)_/¯ 14:25:05 I believe my board only supports gen3 pcie fwiw 14:25:19 I believe its in the m.2 slot on the mobo 14:25:59 sjorge: just checksum errors in scrub? 14:26:22 As far as I could tell the device just poofed out of existance 14:26:28 oh 14:26:33 It's a change in sudo - as of 1.9.14 they changed the default value of the use_pty option to True. 14:26:55 andyf I see, so that probably never worked for us 14:26:56 so schrödinger's NVMe 14:27:03 Which means that commands are run in a new pty. I have not looked at all into why that causes problems for us. 14:27:20 perhaps it's also causing issues for !not illumos 14:27:37 --> https://github.com/sudo-project/sudo/issues/258 14:32:18 seems at lease on arch it was also causing some issues 15:06:47 Well, if I'm going to jinx it I might as well do it sooner rather than later... Moved the new crucial to a different port and the resilver was able to complete. doing another scrub now as a stress test. 15:07:11 I really hope it's just one of the ports on the card being bad. 17:53:54 [illumos-gate] 15774 Memory leak in pkcs11_softtoken when used by metaslot -- Matt Barden 18:18:51 hrm.. it's been observed that dumping to a zvol is noticably slower than a raw device... im wondering about any techniques for digging into that... im guessing dtrace might not work well in a panic context 18:28:15 have you ruled out the obvious? compression and stuff on the zvol? 18:32:15 I don't think we're willing to dump on compressed zvols, etc. 18:32:34 but checking which of the dump bits are disabled in each environment would be good. 18:32:43 or ideally checking the same machine configured each way 18:33:10 I seem to recall that how we compress a dump can vary, so if you're trying on different machines, that might fox you 20:39:13 redmine down for anybody else? 20:39:48 gerrit is ok so I think it's bit a issue ob my end 20:43:03 works for me, though slow 20:44:58 Yeah, there has been a recent increase of crawlers, IIUC. 20:46:52 Gotta feed the so-called AI, donchaknow. 20:50:08 i'll finally be impressed if the ai can create a working driver for illumos by pointing it at a datasheet 20:50:21 don't think that gonna happen anytime soon 20:59:02 I didn't say it made any sense. 21:15:43 I have added some more crawlers to the ignore bits 21:15:54 Performance should hopefully be a bit more uniform now 21:22:18 it's loading again for me 21:54:58 yep, much faster at the moment 21:58:27 the load was very sporadic, too