00:05:06 jclulow: so many different ways of getting stack traces! 00:05:12 It's true 00:05:44 ok, guess I should figure out who I can enlist to test it. 00:09:10 Did someone mention pstack(1) already? 00:09:27 (and, really, if the patch that wacki put in openindiana (checks if certain fields are NULL) is the right fix or if the crash is really a sign that something else is missing). 00:09:57 szilard: what sort of system are you seeing this on? 00:18:30 (I've not seen this bug myself, and I kind of want to understand the problem a little better. Wacki's adds 3 more tests against NULL and it's not clear which one his system was tripping over..) 00:33:58 if my (self-built) copy of the library matches the one on omnios, szilard's fmd was faulting fetching the value of nodes->nodeTab[0] suggesting that nodeTab was NULL. 00:41:04 I mean, I think defensive NULL checks in anything you get out of libxml is always appropriate 00:44:17 sure, but is this check covering up a bug in whatever generates this xml? (probably...) 00:47:43 szilard: is there a "/var/run/fab-xlate-topo.xml" file on your system? if so, could you attach a copy of it to bug 17213 ? 00:54:24 (looking like the xml was built by fmd itself from fab_update_topo()) 01:19:14 sommerfeld: Yeah, sorry, I definitely agree. I just mean even if we don't expect it to be NULL we should check anyway haha 01:19:20 the perils of the libxml programming model etc 06:42:40 sommerfeld: it is a Lenovo m710Q with intel i5-7500T 06:44:57 sommerfeld: yep, the file exists on my system. I'll atrach it. 07:34:56 File attached to the ticket. 14:09:15 szilard_: thank you! 14:49:18 I have managed to compile 'turbo' editor on OmniOS: https://i.imgur.com/8x2edcw.png 16:25:18 i'm now in the "how does this ever work" stage of investigating https://www.illumos.org/issues/17213 16:25:19 → BUG 17213: fmd crashes on HP z& G4 in usr/src/cmd/fm/modules/common/fabric-xlate/fx_subr.c:fab_xpath_query() (New) | https://code.illumos.org/c/illumos-gate/+/4107 16:38:38 there's a clearly missing call to fflush() in fab_update_topo()( 16:44:46 it opens a file descriptor, wraps it in a FILE with fdopen, passes it to topo_xml_print(), and then passes the file name to xmlParseFile() without ever calling fflush() or fclose(), so the tail end of the xml is going to be sitting in the FILE buffer 16:49:25 I'm not currently set up to build for omnios; could someone build a fabric-xlate.so for szilard with the patches from 4107 linked above? 16:49:39 builds in usr/src/cmd/fm/modules/common/fabric-xlate 16:50:21 installs into /usr/lib/fm/fmd/plugins/fabric-xlate.so 18:36:32 sommerfeld - I can build a signed hotfix. Do you know which omnios version it needs to be fore? 18:39:36 SunOS omnios 5.11 omnios-r151052-5ce47a2ab6 i86pc i386 i86pc 18:39:48 this is what i am using 18:40:57 andyf: thanks! 18:42:45 BTW it would appear that one prerequisite for the bug appearing is that the machine has to have some sort of event that triggers the diagnosis engine to invoke this machinery. 18:42:46 It'll take half an hour or so, I'll post here when it's ready. 18:44:26 I can tickle the missing fflush (and get xml parse errors in /var/svc/log/system-fmd:default.log) if I call fab_update_topo at the end of _fmd_init() in fabric-xlate.c 18:46:56 so I think that resolves the "how did this ever work" issue. 18:50:34 szilard_: can you run fmdump -e on the affected system? I'd expect to see a few lines with "ereport.io.pci" or "ereport.io.pciex".. 18:58:51 sommerfeld: https://pastebin.com/raw/9Qf8kUdH 19:01:37 okay, that's consistent with my diagnosis of what's going on. thanks! 19:01:38 Oooh cool! 19:33:07 szilard_ - something like this should get you updated bits; pfexec pkg apply-hot-fix --be-name=fabric-xlate https://hf.omnios.org/r52/fabric-xlate.p5p 19:33:30 It will create a new boot environment so you will need to reboot into the new bits and pieces. 19:39:26 andyf: doing it right now 19:47:59 It seems to be working. At least I don't see any errors in the dmesg output. 19:50:05 "svcs -xv" output is empty. 19:50:29 /var/run/fab-xlate-topo.xml 19:50:40 ^^^ this file doesn't exists 19:59:51 szilard: /var/run/fab-xlate-topo.xml is deleted if it was successfully created and parsed. 19:59:55 so that's to be expected. 20:04:59 sommerfeld: I use OmniOS/Illumos since December, no previous experience, but so far I had only positive experiences. 20:05:14 Both with the system and with the support. 20:18:40 szilard: Also `/var/run` is a ramdisk fs like /tmp (it was created so a system service could put temporary files in a place that isn't so public as /tmp/), so it disappears every reboot. 21:20:48 danmcd: makes sense. thanks.