10:09:54 if you have vmware fusion and backups, you want to make sure your VM's are quiesced enough to make backup. Otherwise the restore may give bad surprises. 14:17:12 more than that, at least on esx (I'm guessing fusion wouldn't be too different) if you take a VM snapshot, or more importantly, remove an old VM snapshot 14:17:32 your I/O to the guest can drop substantially while that's happening 14:17:45 and ZFS _really, really, really, really, REALLY_ does not like this 14:17:57 if it's trying to do any sort of activity 14:18:26 it will happily consume all of your VMs RAM with zios as they backup more and more 17:41:44 [illumos-gate] 16027 NVMe firmware activate timeout should be bumped -- Robert Mustacchi 17:41:44 [illumos-gate] 16029 Clarify and constify nvme DMA attributes -- Robert Mustacchi 18:18:25 jbk: That seems like a bug? 18:52:19 yes, but one that's very hard to fix 18:52:34 because things are so decoupled 18:53:51 ideally, if the queues in the zio scheduler are starting to get backed up (but are still making progress), you'd prefer to either slow new things down (so they're generating less I/O) or just have them fail if they can't tolerate the additional delay 18:54:29 but there didn't seem to be a good way to do that where things couldn't still get by that (so many places that could create zios) 18:55:29 Seems illumos-gate is missing SO_REUSEPORT, is that a bug?. Omnios and Smartos have that option maybe that just was not streamed and was lost between work. 18:55:44 the one thing we've been messing with (but still don't have results) is for zios marked as 'can fail' to fail those w/ ENOMEM (and not send them out to disk) if freemem drops below lotsfree (since that's probably as good a watermark as any) 18:56:11 SO_REUSEPORT was done for LX. It's possible the non-LX portion could get upstreamed, neirac. 18:56:47 since the flag suggests that should be safe (but need more testing on that) 18:57:34 and the thinking is that most of the zio's lifetime is likely spent either in queue or executing on disk 18:58:08 but i'm not sure i'd consider it a true fix, but the hope is w/ testing that it's at least a worthwhile improvement in behavior 18:58:14 neirac, danmcd: I have a strong memory that it wasn't put up for a reason and that what's there isn't really useful for non-lx. 18:58:47 There was some work that someone did in the past for trying to port it, but I think we didn't get it across the line and there were some questions on review (probably I dropped something there). 19:05:05 SO_REUSEPORT has a few distinct use cases and it would IMHO require some significant work to do a high-quality implementation for Illumos that covers all of them. 19:07:46 I know it's been used for a clean handoff between processes (no dropped requests around restart) and it's also used for improved MP scalability on linux (sometimes with eBPF code and cpu binding used to keep connections on the same CPU from rx queue handler to socket to user process..) 19:13:28 Seems at least there is a implementation working https://github.com/omniosorg/illumos-omnios/blob/master/usr/src/uts/common/inet/tcp/tcp_opt_data.c#L776 19:15:10 So this all came from lx, which Patrick implemented. I'm taking him at his word as the original author that it was not suitable for upstream as it didn't handle all those cases. 19:17:30 The other bit that I mentioned above was at https://code.illumos.org/c/illumos-gate/+/464, but I think to sommerfeld's point, there's a lot of nuance here and the lx bits were to get things working 'enough'. 19:17:31 → CODE REVIEW 464: 12455 SO_REUSEPORT support (NEW) | https://www.illumos.org/issues/12455 19:20:46 So I guess to put a more explicit point on it, doing what was done for lx is not going to work. If we're going to introduce it and apps are going to use it, it needs to be able to do both of the cases that were described above, and probably will also need to be done properly for other protocols. 19:23:07 Yeah the implementation done for LX was lacking in several ways, and only support TCP 19:24:20 Being limited to TCP made it possible to do safely, but its behavior was extremely limited, basically working only for the "I want to restart haproxy without dropping the socket on the floor" case 19:25:02 the work in CR 464 expanded it to support UDP as well, but I am not confident that it is safe as written 19:25:26 (since the TCP-only version relied on some exclusion bits which I do not believe apply in the UDP case) 19:25:46 Needless to say, it's been years since I looked closely at that stuff, so take that commentary with a grain of salt. 19:26:55 But I agree with sommerfeld that a robust implementation which covers the use cases desired of SO_REUSEPORT is a substantial undertaking, and probably will require some restructuring of how sockfs interacts with the rest of the netstack to avoid dropping packets/connections on the ground for certain edge cases. 19:28:19 [illumos-gate] 16039 savecore should not be isaexec'd -- Richard Lowe 19:32:27 Anyone have experience with Kioxia CM6 drives? 20:12:44 I know this isn't helpful but man it seems like the state of openZFS is on fire right now 20:13:17 2.2.x is just panicking left and right and triggering assertions 20:14:36 The scary part of it all is that most of the optimizations were about loosening the granularity of locks or eliminating them entirely when possible. If people upgrading are just now encountering these issues, this is going to be a painful root cause effort 20:17:04 These variable workloads need to somehow be better captured in ZTS 20:36:54 [illumos-gate] 16012 Update PCI classes and caps to PCI Ver 1.16 -- Robert Mustacchi 23:20:47 richlowe: https://github.com/TritonDataCenter/illumos-joyent/blob/master/usr/src/cmd/cmd-crypto/decrypt/Makefile#L50 -- should that maybe depend on $(PROG) or $(ROOTPROG) ? 23:20:53 (it's the same upstream) 23:25:50 it shouldn't need to, just to create a symlink? 23:25:57 and .make_state will catch the link changing etc. 23:26:02 did I miss something?