-
sommerfeld
jclulow: so many different ways of getting stack traces!
-
jclulow
It's true
-
sommerfeld
ok, guess I should figure out who I can enlist to test it.
-
danmcd
Did someone mention pstack(1) already?
-
sommerfeld
(and, really, if the patch that wacki put in openindiana (checks if certain fields are NULL) is the right fix or if the crash is really a sign that something else is missing).
-
sommerfeld
szilard: what sort of system are you seeing this on?
-
sommerfeld
(I've not seen this bug myself, and I kind of want to understand the problem a little better. Wacki's adds 3 more tests against NULL and it's not clear which one his system was tripping over..)
-
sommerfeld
if my (self-built) copy of the library matches the one on omnios, szilard's fmd was faulting fetching the value of nodes->nodeTab[0] suggesting that nodeTab was NULL.
-
jclulow
I mean, I think defensive NULL checks in anything you get out of libxml is always appropriate
-
sommerfeld
sure, but is this check covering up a bug in whatever generates this xml? (probably...)
-
sommerfeld
szilard: is there a "/var/run/fab-xlate-topo.xml" file on your system? if so, could you attach a copy of it to
bug 17213 ?
-
sommerfeld
(looking like the xml was built by fmd itself from fab_update_topo())
-
jclulow
sommerfeld: Yeah, sorry, I definitely agree. I just mean even if we don't expect it to be NULL we should check anyway haha
-
jclulow
the perils of the libxml programming model etc
-
szilard_
sommerfeld: it is a Lenovo m710Q with intel i5-7500T
-
szilard_
sommerfeld: yep, the file exists on my system. I'll atrach it.
-
szilard_
File attached to the ticket.
-
sommerfeld
szilard_: thank you!
-
szilard_
I have managed to compile 'turbo' editor on OmniOS:
i.imgur.com/8x2edcw.png
-
sommerfeld
i'm now in the "how does this ever work" stage of investigating
illumos.org/issues/17213
-
fenix
→
BUG 17213: fmd crashes on HP z& G4 in usr/src/cmd/fm/modules/common/fabric-xlate/fx_subr.c:fab_xpath_query() (New) |
code.illumos.org/c/illumos-gate/+/4107
-
sommerfeld
there's a clearly missing call to fflush() in fab_update_topo()(
-
sommerfeld
it opens a file descriptor, wraps it in a FILE with fdopen, passes it to topo_xml_print(), and then passes the file name to xmlParseFile() without ever calling fflush() or fclose(), so the tail end of the xml is going to be sitting in the FILE buffer
-
sommerfeld
I'm not currently set up to build for omnios; could someone build a fabric-xlate.so for szilard with the patches from 4107 linked above?
-
sommerfeld
builds in usr/src/cmd/fm/modules/common/fabric-xlate
-
sommerfeld
installs into /usr/lib/fm/fmd/plugins/fabric-xlate.so
-
andyf
sommerfeld - I can build a signed hotfix. Do you know which omnios version it needs to be fore?
-
szilard_
SunOS omnios 5.11 omnios-r151052-5ce47a2ab6 i86pc i386 i86pc
-
szilard_
this is what i am using
-
sommerfeld
andyf: thanks!
-
sommerfeld
BTW it would appear that one prerequisite for the bug appearing is that the machine has to have some sort of event that triggers the diagnosis engine to invoke this machinery.
-
andyf
It'll take half an hour or so, I'll post here when it's ready.
-
sommerfeld
I can tickle the missing fflush (and get xml parse errors in /var/svc/log/system-fmd:default.log) if I call fab_update_topo at the end of _fmd_init() in fabric-xlate.c
-
sommerfeld
so I think that resolves the "how did this ever work" issue.
-
sommerfeld
szilard_: can you run fmdump -e on the affected system? I'd expect to see a few lines with "ereport.io.pci" or "ereport.io.pciex"..
-
szilard_
-
sommerfeld
okay, that's consistent with my diagnosis of what's going on. thanks!
-
danmcd
Oooh cool!
-
andyf
szilard_ - something like this should get you updated bits; pfexec pkg apply-hot-fix --be-name=fabric-xlate
hf.omnios.org/r52/fabric-xlate.p5p
-
andyf
It will create a new boot environment so you will need to reboot into the new bits and pieces.
-
szilard_
andyf: doing it right now
-
szilard
It seems to be working. At least I don't see any errors in the dmesg output.
-
szilard
"svcs -xv" output is empty.
-
szilard
/var/run/fab-xlate-topo.xml
-
szilard
^^^ this file doesn't exists
-
sommerfeld
szilard: /var/run/fab-xlate-topo.xml is deleted if it was successfully created and parsed.
-
sommerfeld
so that's to be expected.
-
szilard
sommerfeld: I use OmniOS/Illumos since December, no previous experience, but so far I had only positive experiences.
-
szilard
Both with the system and with the support.
-
danmcd
szilard: Also `/var/run` is a ramdisk fs like /tmp (it was created so a system service could put temporary files in a place that isn't so public as /tmp/), so it disappears every reboot.
-
szilard
danmcd: makes sense. thanks.