-
wardenHi to all. The first OmniOS server I put in production is rebooting from time to time... I suspect that the cause lies in ipfilter, which I have enabled and configured, but I'm not able to track the issue to the root cause.
-
wardenHere is the fmdump output and a basic analysis of the core dump file: paste.omnios.org/?eb3d39a5705b95cf#…2LgPPKsa5BJqCVMELdmj1UJ46Y1JVwftz6N
-
wardenI'd be very grateful if someone more skilled than me can suggest how to further analyze the issue. Thanks! :)
-
rzezeskiwarden: running `::status` in mdb would be helpful
-
rzezeskiIt appears to be a problem with grabbing a mutex, but I see nothing obvious in fr_tcp_age(). Makes me wodner if a stack frame is missing because it does call fr_movequeue(), which does try to enter a mutex.
-
rzezeskiAlso, wow, didn't know we had some K&R C still floating around.
-
wardenThank you rzezeski, here it is the ::status output: paste.omnios.org/?d6088f55f52d9cb7#…hu91NQpiRgRkTCzXWRqsTdhZ8qKWvMJcAcS
-
wardenplease note that all these debug commands are arcane for me... threat me like a baby! :)
-
wardens/threat/treat/
-
rzezeskiOkay, so a null pointer. Let's look at the tqe argument passed to fr_movequeue(). Run this in mdb: `fffffe599ee636a0::print -t ipftqent_t`
-
rzezeskiIt also might be nice to just keep adding to the same paste if that's possible so everything is in one spot.
-
wardenThanks. PrivateBin seems not to allow me to modify existing paste, so I merged all text here: box.messagecloud.it/index.php/s/2Ag…zmi6M8/download/OmniOS_coredump.txt
-
rzezeskiOkay, now let's look at the tqe_ifq member. Run this in mdb: `fffffe599ee636a0::print -t ipftqent_t tqe_ifq |::print -t ipftq_t`
-
wardengreat, I've just updated the text file (also readable inside the browser window at this URL: box.messagecloud.it/index.php/s/2Agx3n367zmi6M8 )
-
rzezeskiwarden: weird, it seems a NULL pointer was passed to mutex_enter(), but the code (albeit very confusing) doesn't seem like it should do that. So I might be making the wrong assumption. We are missing a stack frame because mutex_enter is optimized and doesn't store one. We should be able to look at it by doing something like: `fffffe007d7fcbd0,30/nap`
-
wardenrzezeski: I barely understand what you mean, but I've just updated the text file with the last output. Thanks!