04:12:14 Hi! (I chat here quite rare, last time it was still on freenode...). I bumped into what I assume was a panic: system became unresponsive over network for a few minutes and after I was able to get back into it, uptime was reset and I've got a system mail about Fault Management Event. And I was able to reproduce the fault scenario another time (I didn't try to repeat it more since it's my only SmartOS 04:12:20 box). Should I describe the case in bugtracker or here first, so somebody'd be able to say if something of worth to fix? Scenario itself is related to running snoop on vnics over etherstub 04:13:20 (I searched for something similar on bugtracker, but didn't find anything resembling what I did and saw) 05:52:59 gemelen : have you looked for a core dump, yet? 15:07:46 gemelen: I would check to see if a crash dump was created. It'd usually be under something like /var/crash/volatile and have a name like vmdump. 15:11:08 yep, I could see two dump files 15:12:07 OK, so that'll have information about what went wrong. 15:12:57 A useful starting point is to run savecore -f vmdump. somewhere which will decompress the dump creating two files. With that we can get a starting point of information about what's going wrong. Normally my default is to run something like mdb -e '::status; $C' 15:14:34 Sounds good. Could you please recommend any guide on more detailed instructions (if any)? I'd understand what to do, just don't have experience dealing with panics on this platform 15:21:27 There's https://illumos.org/books/mdb which talks about the debugger itself. 15:21:40 There's https://illumos.org/books/dev/debugging.html#panics which has a little bit about showing the example of what I described. 15:22:30 But I don't have a good guide off hand on how to really go diagnose this. The hope is that those two commands with mdb will give us a good starting point. From there, we can either suggest some follow ups or if you can share the dump, someone may be able to look. 15:30:18 No problem. I'll look myself at first. There is no problem to share dumps, nothing special or confidential about this sytem. So I'll be back with questions if they'd arise. Thanks 15:33:13 If you can share the stack and status output, that'll at least let us tell you if it's a known bug or not. 15:40:01 oh, yes, give me a few minutes 15:42:13 Take your time, it's at your convenience. It'll be a bit before I could look at anything myself. 15:47:57 https://gist.github.com/gemelen/105c32e6a47285a85b802a0cf7bcac1c 16:52:56 Interesting... 17:05:33 sounds similar to what was fixed in https://github.com/illumos/illumos-gate/commit/85dff7a05711e1238299281f8a94d2d40834c775 17:05:34 → GitHub commit 85dff7a: 15167 Panic when halting a zone with self-created links 15407 (committed) 17:08:09 https://github.com/TritonDataCenter/illumos-joyent/issues/155 17:08:11 The release is very new though: joyent_20250206T001102Z 17:11:34 @gemelen can you email me (danmcd⊙mi) the reproduction? 17:43:56 danmcd: I've added the steps in a comment to the gist. 17:50:17 Oooh thank you. Etherstub use like that isn't tested ... and you say `snoop -z` (A SmartOS specific change)? 17:50:44 I *think* I know what happened, but I'll have to find out myself for sure. 17:51:21 yes, I was running it over ifaces in zones, which are not available to snoop from gz otherwise without that flag 17:51:50 I'll share a link to dump in a moment, if it'd be useful 17:51:55 The `snoop -z` you're running is likely causing the extra reference that is triggering the VERIFY. 17:52:58 Asa workaround, you could `zlogin $ZONE snoop -d ....` instead. That way when the vmadm stop happens, the snoop process dies first BEFORE datalinks get unloaded. 17:53:14 yeah, makes sense 17:54:39 just a short disclaimer: it's my personal box, so no pressure of any production. and I'm open to help with more info if required 17:56:14 The use case is interesting, however, and deserves some deeper dives. 17:57:21 it felt natural: I have a network config to fix and it was reasonable to sniff what's going on between zones (as in a router and a regular client) 17:58:03 I don't think it's unnatural, I do think it's not been used on SmartOS all that much before, esp. in combination with `snoop -z` 17:58:27 Dumb question: Your zones... are they native, LX, KVM, or BHYVE ? 17:58:50 And yes, the dump link would be helpful. 17:59:26 I started with one native "router" zone and one "bhyve" client, but the second was with both native zones 17:59:50 I'd like the second dump, please? Easier to reproduce here. 18:00:02 sure 18:06:31 danmcd: https://gemelen.net/share/vmdump.1 (not sure if it makes sense to try to compress it again) 18:06:44 It does not. Thank you. 18:07:43 I'm downloading it now, but I'm out for the rest of the day starting in less than an hour. Thank you for bringing this up. I *do* wonder if it can be recreated merely by using `snoop -z` or not? 18:09:09 yes, it should be 18:09:47 I think I just induced it myself with a quick little try. 18:15:08 Okay, this is all useful data, thank you gemelen. I don't know when I can dig deeply into it, but it's definitely a bug. 18:16:07 glad to help